Programming assignments
Assignment 4: LDA on Spark
- Link to assignment PDF
- Due: Friday, April 10 by 11:59:59pm (EST)
Assignment 3: SVD on Spark
- Link to assignment PDF
- Due: Thursday, March 19 by 11:59:59pm (EST)
Assignment 2: K-Means on Hadoop
- Link to assignment PDF
- Due: Monday, February 23 by 11:59:59pm (EST)
Assignment 1: Naive Bayes on Hadoop
- Link to assignment PDF
- Due:
Thursday, January 29Monday, February 2 by 11:59:59pm (EST)
Student presentations
Each student will deliver one presentation over the course of the semester (sign-up and schedule is here). This presentation should cover some recent publication related to big data, data mining, and/or scalable machine learning (here are some suggestions). The presentation should include a slide deck, and be 40-50 minutes in length, leaving time after for a discussion on the merits and methods of the paper.
Given the open-ended discussion following the presentation, I strongly encourage everyone to be familiar with the paper (as in, read it ahead of time).
The general framework for a presentation should be as follows:
- Introduction / Background of the paper (provide the context of the problem)
- Description of the method or framework proposed in the paper (why is it novel?)
- Detailed description of figures and tables in the paper
- Analysis of the results
- Conclusions of the study and what its implications are for the broader research field
- Your thoughts: does the conclusion address what was proposed in the introduction? what are the strengths and weaknesses of the method? do the authors seem aware of these factors, the weaknesses in particular? is there anything the authors should have addressed but did not?
Use of Git and BitBucket
This course places an emphasis on iterative software development. To that end, we will be making use of git for version control (availble for all operating systems), and bitbucket for assignment and project submission (free accounts are available).
Your assignments will be graded based on the content in your bitbucket repositories, so if you're having problems installing / configuring git or bitbucket, please ask for help. If you have never used a version control system before (e.g. CVS, subversion, mercurial, perforce, etc), please ask for help.