Lecture schedule

Student lectures denoted as "SL".

Guest lectures denoted as "GL".

  Date Description Readings Slides
Week 1 Jan 6 Introduction The unreasonable effectiveness of data pptx, pdf
Jan 7 Probability Review   pptx, pdf
Jan 8 Naive Bayes and Hadoop   pptx, pdf
Week 2 Jan 13 Stream-and-sort + MapReduce A language model approach to keyphrase extraction pptx, pdf
Jan 14 Streaming Data A Probabilistic Analysis of the Rocchio Algorithm with TF-IDF for Text Categorization pptx, pdf
Jan 15 Classification + Regression   pptx, pdf
Week 3 Jan 20 SL: Michael Church Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-commerce gdoc
Jan 21 Assignment 1 Review Classification on high octane: Naive Bayes pptx, pdf
Jan 22 SL: Seyed N. Hashemi Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion pdf
Week 4 Jan 27 Graphs (Part I) Sampling from Large Graphs pptx, pdf
Jan 28 Graphs (Part II) Design Patterns for Efficient Graph Algorithms in MapReduce pptx, pdf
Jan 29 Clustering   pptx, pdf
Week 5 Feb 3 SL: Bita K. Zahrani Distributed Approximate Spectral Clustering for Large-Scale Datasets pptx
Feb 4 Spectral clustering A Tutorial on Spectral Clustering pptx, pdf
Feb 5 SL: William Richardson Distributed PCA and k-Means Clustering odp, pdf
Week 5 Feb 10 SL: Joey Ruberti Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language pdf
Feb 11 NO LECTURE TODAY
Feb 12 SL: Alekhya Chennupati Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud pptx
Feb 13 SPECIAL LECTURE: Vaa3D 11am-12pm; Boyd 306
Week 6 Feb 17 Lecture Cancelled
Feb 18 Semi-supervised learning   pptx, pdf
Feb 19 SL: Manish Ranjan Large-scale machine learning at Twitter pdf
Week 7 Feb 24 Dimensionality Reduction   pptx, pdf
Feb 25 Midterm review   practice exam
Feb 26 MIDTERM EXAM
Week 8 Mar 3 Introduction to Spark Distributing Matrix Computations with Spark MLlib pptx, pdf
Mar 4 SL: Roi Ceren Tracking Climate Change Opinions from Twitter Data ppt
Mar 5 NO LECTURE TODAY
SPRING BREAK
Week 9 Mar 17 GL: Xiang Li Dictionary Learning in Functional Brain Imaging pptx
Mar 18 Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation, 2003 pptx, pdf
Mar 19 SL: Bahaaeddin M. Alaila Discretized Streams: Fault-Tolerant Streaming Computation at Scale pptx
Week 10 Mar 24 Factorbird Factorbird - a Parameter Server Approach to Distributed Matrix Factorization pptx, pdf
Mar 25 Randomized algorithms   pptx, pdf
Mar 26 SL: Muthukumaran Chandrasekaran Event Detection via Communication Pattern Analysis pptx
Week 11 Mar 31 SL: Ankita Joshi NIFTY: A System for Large Scale Information Flow Tracking and Clustering pdf
Apr 1 Recommendation and Advertising   pptx, pdf
Apr 2 Stochastic SVD Randomized methods for computing low-rank approximations of matrices pptx, pdf
Week 12 Apr 7 GL: Budak Arpinar Semantic Web html
Apr 8 GL: Lakshmish Ramaswamy    
Apr 9 SL: Zhaochong Liu An algorithm for the principal component analysis of large data sets pptx
Week 13 Apr 14 GL: Arvind Ramanathan Oak Ridge Bio-surveillance Toolkit: Statistical Inference Tools for Public Health Dynamics pptx
Apr 15 Decision trees PLANET MapReduce decision trees pptx, pdf
Apr 16 Other distributed frameworks   pptx, pdf
Week 14 Apr 21 FINAL PROJECT PRESENTATIONS See the project page for schedule of presenters  
Apr 22  
Apr 23