Data Science Seminar
Hosted by Department of Mathematical Sciences
Much of the first wave of Data Science programs was built on a foundation of already existing computing-oriented classes; less effort was spent on how people from diverse backgrounds and disciplines approach data science. At Carnegie Mellon, the Department of Statistics & Data Science teaches thousands of students with future degrees ranging from Pre-Med to Rhetoric to Chemistry to Business to Statistics & Machine Learning and are well positioned to tackle this pedagogical challenge. In the last two years, we have begun developing ISLE (Interactive Statistics Learning Environment), an interactive platform that removes the computing cognitive load and lets students explore Statistics & Data Science concepts in both structured and unstructured ways. The platform also supports student-driven inquiry and case studies. We track and model every click, word used, and decision made throughout the data analysis pipeline from loading the data to the final written report. The platform is flexible enough to allow adaptation, providing different modes of data analysis instruction, active learning opportunities, and exercises for different subsets of the population. Students are also able to build their own case studies with little restriction or faculty intervention. The resulting data sets are invaluable in capturing behavioral data science information and generate interesting statistical methodological questions about how to model the learning processes using data of mixed modality (clicks, text, audio, video, etc). We present some initial methodological work with an emphasis on developing variable selection methods when clustering circular data (text). In short, teaching Data Science while simultaneously learning how we do it.