Applied Analytics: ML Pipeline

94-887

Units: 12

Description

Machine learning algorithms transform fields with new analytic capabilities, ways of visualizing data, and are key drivers in decision making. But when and how are they useful? Knowing when and how to apply appropriate machine learning techniques requires understanding of the machine learning pipeline, from data to machine learning algorithms to problem domain. This class seeks to teach students how to deal with messy data and provisional questions and turn them into actionable interpretations and insights.

The course will cover discovery, planning, analysis, and interpretation. Discovery involves understanding the data at hand, determining what is and is not answerable, and question generation. Planning involves contrasting the application of the desired machine learning method on ideal clean data with the messy data at hand. Dealing with representation, missing data, and designing appropriate machine learning machinery are all involved in planning. Analysis involves applying the machine learning method, checking model performance and assumptions in a principled and responsible manner. Interpretation involves the transformation of algorthm outputs into meaningful and actionable characterizations of the results. Each part of the pipeline is interconnected and students will learn to anticipate and address limitations through understanding of the pipeline as a whole. Throughout the course we will focus on one vertical, health care, recognizing that the methods developed will generalize to others. We will contrast advanced machine learning methods against simpler methods used in health care analytics, and describe the advantages and limitations of each.

This course will be a mixture of lectures and small group workshop activities culminating in a final project. There will be no final exam.

Learning Outcomes

- learn and adapt the mathematical formulations of machine learning methods for principled application

- understand the strengths and limitations of existing analytic strategies, including: randomized controlled trials, observational studies, Cox proportional hazards, logistic regression

In addition, practicum goals include:

- performing end-to-end machine learning analysis, including: data exploration, preparation, cleaning, prediction, validation, visualization, and interpretation

- building working knowledge of the R tidyverse shiny data science pipeline

- learning to build interactive visualizations of machine learning analyses

- learning to write a conference-style white paper in Latex

Prerequisites Description

Students should have completed or be concurrently taking:
Data Mining,
Machine Learning for Problem Solving,
ML 17-401,
ML 17-601,
or the equivalent.

Previous exposure to R, Python or another programming language is highly recommended.