Lecture schedule
Student lectures denoted as "SL".
Guest lectures denoted as "GL".
Date | Description | Readings | Slides | |
---|---|---|---|---|
Week 1 | Jan 6 | Introduction | The unreasonable effectiveness of data | pptx, pdf |
Jan 7 | Probability Review | pptx, pdf | ||
Jan 8 | Naive Bayes and Hadoop | pptx, pdf | ||
Week 2 | Jan 13 | Stream-and-sort + MapReduce | A language model approach to keyphrase extraction | pptx, pdf |
Jan 14 | Streaming Data | A Probabilistic Analysis of the Rocchio Algorithm with TF-IDF for Text Categorization | pptx, pdf | |
Jan 15 | Classification + Regression | pptx, pdf | ||
Week 3 | Jan 20 | SL: Michael Church | Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-commerce | gdoc |
Jan 21 | Assignment 1 Review | Classification on high octane: Naive Bayes | pptx, pdf | |
Jan 22 | SL: Seyed N. Hashemi | Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion | ||
Week 4 | Jan 27 | Graphs (Part I) | Sampling from Large Graphs | pptx, pdf |
Jan 28 | Graphs (Part II) | Design Patterns for Efficient Graph Algorithms in MapReduce | pptx, pdf | |
Jan 29 | Clustering | pptx, pdf | ||
Week 5 | Feb 3 | SL: Bita K. Zahrani | Distributed Approximate Spectral Clustering for Large-Scale Datasets | pptx |
Feb 4 | Spectral clustering | A Tutorial on Spectral Clustering | pptx, pdf | |
Feb 5 | SL: William Richardson | Distributed PCA and k-Means Clustering | odp, pdf | |
Week 5 | Feb 10 | SL: Joey Ruberti | Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language | |
Feb 11 | NO LECTURE TODAY | |||
Feb 12 | SL: Alekhya Chennupati | Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud | pptx | |
Feb 13 | SPECIAL LECTURE: Vaa3D | 11am-12pm; Boyd 306 | ||
Week 6 | Feb 17 | Lecture Cancelled | ||
Feb 18 | Semi-supervised learning | pptx, pdf | ||
Feb 19 | SL: Manish Ranjan | Large-scale machine learning at Twitter | ||
Week 7 | Feb 24 | Dimensionality Reduction | pptx, pdf | |
Feb 25 | Midterm review | practice exam | ||
Feb 26 | MIDTERM EXAM | |||
Week 8 | Mar 3 | Introduction to Spark | Distributing Matrix Computations with Spark MLlib | pptx, pdf |
Mar 4 | SL: Roi Ceren | Tracking Climate Change Opinions from Twitter Data | ppt | |
Mar 5 | NO LECTURE TODAY | |||
SPRING BREAK | ||||
Week 9 | Mar 17 | GL: Xiang Li | Dictionary Learning in Functional Brain Imaging | pptx |
Mar 18 | Latent Dirichlet Allocation (LDA) | Latent Dirichlet Allocation, 2003 | pptx, pdf | |
Mar 19 | SL: Bahaaeddin M. Alaila | Discretized Streams: Fault-Tolerant Streaming Computation at Scale | pptx | |
Week 10 | Mar 24 | Factorbird | Factorbird - a Parameter Server Approach to Distributed Matrix Factorization | pptx, pdf |
Mar 25 | Randomized algorithms | pptx, pdf | ||
Mar 26 | SL: Muthukumaran Chandrasekaran | Event Detection via Communication Pattern Analysis | pptx | |
Week 11 | Mar 31 | SL: Ankita Joshi | NIFTY: A System for Large Scale Information Flow Tracking and Clustering | |
Apr 1 | Recommendation and Advertising | pptx, pdf | ||
Apr 2 | Stochastic SVD | Randomized methods for computing low-rank approximations of matrices | pptx, pdf | |
Week 12 | Apr 7 | GL: Budak Arpinar | Semantic Web | html |
Apr 8 | GL: Lakshmish Ramaswamy | |||
Apr 9 | SL: Zhaochong Liu | An algorithm for the principal component analysis of large data sets | pptx | |
Week 13 | Apr 14 | GL: Arvind Ramanathan | Oak Ridge Bio-surveillance Toolkit: Statistical Inference Tools for Public Health Dynamics | pptx |
Apr 15 | Decision trees | PLANET MapReduce decision trees | pptx, pdf | |
Apr 16 | Other distributed frameworks | pptx, pdf | ||
Week 14 | Apr 21 | FINAL PROJECT PRESENTATIONS | See the project page for schedule of presenters | |
Apr 22 | ||||
Apr 23 |