Syllabus ==== Math 457 Introduction to Statistical Learning. Fall 2021.==== Binghamton University * Instructor: Vladislav Kargin * Office: WH-136 * Meeting time and location: MWF 8:00 - 9:30 am at OH-G102. * Office hours: MWF 9:45 - 10:30 (in person, office WH136), Tue 4:00PM - 5:00PM (via Zoom, ID 949 5616 9870), or by appointment ** This course is a 4-credit course, which means that in addition to the scheduled lectures/discussions, students are expected to do at least 9.5 hours of course-related work each week during the semester. This includes things like: completing assigned readings, participating in lab sessions, studying for tests and examinations, preparing written assignments, completing internship or clinical placement requirements, and other tasks that must be completed to earn credit in the course. ** === Prerequisite === * Scientific programming in a language such as R, Matlab, or Python. * Linear regression and its inference * Matrix algebra, preferably including orthogonality, eigenvalues and eigenvectors, and singular value decomposition. === Description === This course is a survey of statistical learning methods. It will cover major statistical learning methods and concepts for both supervised and unsupervised learning. Topics covered include regression methods with sparsity or other regularizations, model selection, introduction to classification, including discriminant analysis, logistic regression, support vector machines, and kernel methods, nonlinear methods, clustering, decision trees, random forest, boosting and ensemble learning, deep learning === Learning Outcomes === Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to * process and visualize different data types, * apply basic statistical learning methods to build predictive models or perform exploratory analysis * have basic understanding of the underlying mechanism of predictive models and evaluate and interpret such models, * properly tune, select and validate statistical learning models, * use analytical tools and software widely used in practice, * work both independently and in a team to solve problems, and * learn to present and communicate the findings effectively. === Recommended Texts === * James, Witten, Hastie and Tibshirani, 2021. "An Introduction to Statistical Learning with Applications in R.2nd edition" The Book Home Page is at "http://www-bcf.usc.edu/~gareth/ISL/index.html". A pdf file can be downloaded from this page. /*
Hastie, Trevor, Tibshirani, Robert, and Friedman, J. H. 2009. The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer New York.
Hastie, Trevor, Tibshirani, Robert, and Wainwright, Martin. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC; 1 edition (May 7, 2015). A pdf file of the first edition can be downloaded from "https://web.stanford.edu/~hastie/StatLearnSparsity/".
Bühlmann, Peter, and van de Geer, Sara. Statistics for High-Dimensional Data. Springer-Verlag Berlin Heidelberg.
Boyd, Stephen, and Vandenberghe, Lieven. Convex Optimization. Cambridge University Press. The PDF file of the book can be downloaded for free.
Students will compete against each other in a Data Analysis Contest. The competition will begin on Tuesday, Feburary 20 and can be completed in teams of 2 – 4 members. Grades will be based upon a progress report and a final report (one per team) as well as the contest results. Further details about the contest along with specific grading criteria will be given in a separate document and discussed in class.
*/ === Tentative schedule === | Midterm | Nov 22 | | Project Proposal | due Nov 24 | | Preliminary report | due Dec 3 | | Project presentations | December 6, 8, 10 | | Final Report | due December 13 | /* 30 person/3 = 10 groups. Every group presentation: 20 minutes, so 4 groups in 1 class so 3 classes needed. */ /* Students are expected to write homework in LaTex. For users with no experience with LaTex, I suggest using the cloud LaTex editing service at "https://www.overleaf.com/".Lecture | Week | Date | Module | Tentative Topic | Assigned | Due |
1 | 1 | Jan-21, Tuesday | I. Regression with Sparsity | Introduction | HW0 | |
2 | Jan-23, Thursday | MSE & Least Square | ||||
3 | 2 | Jan-28, Tuesday | Ridge Regression | HW1 | HW0 | |
4 | Jan-30, Thursday | Sparse Regression I | ||||
5 | 3 | Feb-4, Tuesday | Sparse Regression II | |||
6 | Feb-6, Thursday | Graphical Models & Compressed Sensing | ||||
7 | 4 | Feb-11, Tuesday | II. Pipeline for Statistical Learning | Model Selection and Assessment | HW 2 | HW 1 |
8 | Feb-13, Thursday | Model Validation | ||||
9 | 5 | Feb-18, Tuesday | Case Studies & Logistic Regression | |||
10 | Feb-20, Thursday | III. Classification Methods | LR Computing & Other GLMs | |||
11 | 6 | Feb-25, Tuesday | Sparse GLM & Bayes Classifier | HW 3 | HW2 | |
12 | Feb-27, Thursday | LDA | ||||
13 | 7 | Mar-3, Tuesday | SVM I: linear SVM | |||
NA | Mar-5, Thursday | NO CLASS | ||||
14 | 8 | Mar-10, Tuesday | SVM II: dual solution, kernel SVM | |||
15 | Mar-12, Thursday | IV. Nonlinear Methods | Nonlinear I: RKHS, KRR, Kernel PCA | HW 4 | HW3 | |
16 | 9 | Mar-17, Tuesday | Nonlinear II: Polynomial reg., smoothing, GAM, etc. | |||
17 | Mar-19, Thursday | Neural Network | ||||
18 | 10 | Mar-24, Tuesday | V. Dimension Reduction | Dimension Reduction I: PCA-1 | ||
19 | Mar-26, Thursday | Dimension Reduction I: PCA-2 | HW 5 | HW4 | ||
20 | 11 | Mar-31, Tuesday | Dimension Reduction II: Extensions, NMF | |||
21 | Apr-2, Thursday | Dimension Reduction II: ICA, MDS | ||||
NA | 12 | Apr-7, Tuesday | NO CLASS | |||
NA | Apr-9, Thursday | NO CLASS | ||||
22 | 13 | Apr-14, Tuesday | VI. Clustering | Clustering I: K-means | ||
23 | Apr-16, Thursday | Clustering II: k-means, EM, HC | HW 6 | HW5 | ||
24 | 14 | Apr-21, Tuesday | Clustering III, HC; Trees | |||
25 | Apr-23, Thursday | Midterm Exam | ||||
26 | 15 | Apr-28, Tuesday | VII. Ensemble Methods | Bagging; Random Forests | ||
27 | Apr-30, Thursday | Ensembles & Boosting | ||||
28 | May-5, Tuesday | Gradient Boosting & XGBoost | HW 6 |