Data Mining

95-791

Units: 6

Description

Data mining is the science of discovering structure and making predictions in large, complex data sets. Nowadays, almost every organization collects data, which they hope to use to support improved decision making. Learning from data can enable us to better: detect fraud, make accurate medical diagnoses, monitor the reliability of a system, perform market segmentation, improve the success of marketing campaigns, and much, much more.

This course serves as an introduction to Data Mining for students in Business and Data Analytics. Students will learn about many commonly used methods for predictive and descriptive analytics tasks. They will also learn to assess the methods' predictive and practical utility.

Learning Outcomes

By the end of this class students will learn:

Be able to produce, comprehend and run Python code for commonly used data mining methods.
Understand the advantages and disadvantages of multiple data mining methods. This involves:
1. Generalizability
2. Bias-variance trade-off
3. Interpretability-flexibility tradeoff
Be able to compare the utility of different methods through lab exercises, homeworks, and a final project.
Understand the concepts behind feature engineering, and be able to place them into practice through different types of data.
Be able to choose an appropriate model/s for a dataset and evaluate the performance and reliability of such model/s.
Be able to apply methods to real-world data.

Prerequisites Description

Prior to this course, students need to have taken

90-819 (Intermediate Programming with Python) or 95-888 (Data Focused Python)
A statistics course such as 90-707, or 90-711