Getting Started with Data Warehousing on AWS
Getting Started with Data Warehousing on AWS
Let’s say you want to get started building a data warehouse on AWS. You’re going to need to know the following:
- Do you need a data lake, data warehouse, data mart or all some combination?
- What is Redshift the AWS data warehouse service?
- How to setup your account and provision the resources for your data warehouse
- How to do initial ETL (Extract, Transform, Load) from your operational data stores into your data - warehouse.
- How to automate ongoing ETL into your data warehouse.
- Ongoing maintenance of the data warehouse in terms of space usage, performance, security.
- How to then provide access to the data to your users through a BI presentation layer like Tableau, Looker, or some other BI system.
- How to access the data through query tools or an API
For this article we’ll take a look at starting points to get familiar with the concepts and first implementations of a data warehouse.
Resources
AWS Data Warehouse Overview
AWS Data Warehouse - high level overview of the tiers of a data warehouse, what a data warehouse is used for and the differences between data warehouse, operational database, data lake and data mart.
AWS Data Services Course
AWS Data Services - overview course on getting started with AWS data services. Available with free trial or Linkedin Premium subscription.
AWS Data Warehouse Implementation Overview
Implementing a Data Warehouse on AWS - overview of AWS data warehousing on AWS youtube channel.
AWS Data Warehouse Walkthrough
Deploy a Data Warehouse on AWS - walkthrough to setup an Amazon Redshift cluster, load sample data, and setup SQL Workbench/J for data analysis.
Next Steps
Choose an overview video or article and review it. Then get started with Deploy a Data Warehouse on AWS. Be sure to check back for updates as I work on answering the questions I posed at the top of the article.