Unstructured Data Analytics

95-865

Units: 6

Description: Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a two-step approach. We first examine how to identify possible structure present in the data via visualization and other exploratory methods. Once we have clues for what structure is present in the data, we turn toward exploiting this structure to make predictions. Many examples are given for how these methods help solve real problems faced by organizations. Along the way, we encounter many of the most popular methods in analyzing unstructured data, from modern classics in manifold learning, clustering, and topic modeling to some of the latest developments in deep neural networks for analyzing text, images, and time series. We will be coding lots of Python and working with Amazon Web Services (AWS) for cloud computing (including using GPUs). Note that students cannot receive credit for both 95-865 ("Unstructured Data Analytics") and 94-775 ("Unstructured Data Analytics for Policy"). More information is available at the course webpage: http://www.andrew.cmu.edu/user/georgech/95-865/ Note that there will be a fair amount of coding in Python and working with sufficiently large datasets. We will be making use of standard Python machine learning libraries such as scikit-learn and keras.

Learning Outcomes: By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam. * Recall and discuss common methods of conducting exploratory and predictive analysis of unstructured data; * Write Python code for exploratory and predictive data analysis that handles large datasets; and * Work with the Amazon AWS cloud computing platform; and * Apply unstructured data analysis techniques discussed in class to solve problems faced by governments and companies.

Prerequisites: 95-888

Syllabus: 95-865_Unstructured_Data_Analytics_Syllabus_S19.pdf