Data Warehousing


Units: 6


Data warehouses are the central component of a modern data stack. They have solved the problem of analyzing massive amounts of structured & semi-structured data and are cost-effective, performant and easy to use. Data warehouses are the foundation for reporting, ad hoc analysis, business intelligence and machine learning, and enable collaboration among a diversity of users and stakeholders across organizations of all sizes.

This class will provide students with the conceptual background and hands on keyboard skills needed to utilize a data warehouse effectively. Throughout the course, students will work on an end-to-end development project, building a working data platform for New York City transit data. Using actual taxi, rideshare, bike share and weather data, students will answer real-world analytics questions, such as "How does location and time of day affect trip length?" and "How does weather affect transit preferences?". By the end, students will be empowered with the tools and techniques needed to take a real-world data project from problem statement to prototype to production.

Learning Outcomes

Learning objectives: Implement data ingest techniques (ETL) Write advanced SQL for data analytics, including geographic and time series Transform data using dbt Compare modern and classic strategies of data modeling Understand data warehouse architecture Maintain data quality Integrate a data warehouse in a broader data platform

Prerequisites Description

Prerequisites: 95–703 A: Database Management. Basic knowledge of programming (Python strongly recommended) and UNIX shell. Basic data analysis.