Main Text (ISL): An Introduction to Statistical Learning, 2013.
Gareth James, Daniela Witten, Trevor Hastie and Robert TibshiraniAdvanced Reference (ESL): The Elements of Statistical Learning, 2nd Ed., 2009.
Trevor Hastiei, Robert Tibshirani and Jerome Friedman
ScalaTion (DSS): Introduction to Data Science Using ScalaTion, 2018.
John A. MillerR: Linear Regression Using R: An Introduction to Data Modeling, 2016.
David J. LiljaSpark: Machine Learning Library (MLlib) Guide, 2018.
Apache Spark
Day Plan Room Monday 2:30-3:20 517 Forestry Resources 4 Tuesday 2:00-3:15 453 Chemistry Thursday 2:00-3:15 453 Chemistry
A rigorous overview of methods for text mining, image processing, and scientific computing. Core concepts in supervised and unsupervised analytics, dimensionality reduction, and data visualization will be explored in depth.
Exam I: closed notes and book; bring calculator; 1 page info sheet allowed.
15% Exam I: topics = {see below} 9/25 5% Quiz: topics = {see below} 10/11 20% Exam II: topics = {see below} 11/15 25% Final Exam 12/11 30% Programs (groups of 3) [software: Java SE 8, Scala 2.12.4, ScalaTion 1.5, R 3.5.x, Apache Spark 2.3.x, SBT 1.1.x] 5% Homework/Tool Talks [presentation]
Quiz: closed notes and book; bring calculator; 2 page info sheet allowed.
Review Date: Oct 9, 2018
Exam Date: Oct 11, 2018
3 Questions:
(1) Simple Regression, DSS 4.4, ISL 3.1
(2) Multiple Regression, DSS 4.5, ISL 3.2
(3) Naive Bayes, DSS 5.6.
Exam II: closed notes and book; bring calculator; 2 page info sheet allowed.
Review Date: Nov 13, 2018
Exam Date: Nov 15, 2018
5 Questions:
(1) Bayesian Classifiers, DSS 5.1-5.7, ISL 2.2.3,
(2) Decision Trees, DSS 5.10, ISL 8.1,
(3) Logistics Regression, DSS 6.1-6.4, ISL 4.1-4.3,
(4) K-NN Classifier, DSS 6.7, ISL 2.2.3, 3.5,
(5) Perceptrons, DSS 9.2, ESL 11.
Final Exam: closed notes and book; bring calculator; 3 page info sheet allowed.
Review Date: Monday, Dec 3, 2018
Exam Date: Tueday, Dec 11, 2018: 3:30 - 6:30 pm
6 Questions:
(1) Statistics and Machine Learning (one page essay)
(2) Multiple Regression, DSS 4.5, ISL 3.2
(3) Naive Bayes, DSS 5.6,
(4) K-NN Classifier, DSS 6.7, ISL 2.2.3, 3.5,
(5) Cross-Validation, DSS 4.5, 5.1, ISL 5.1,
(6) Perceptrons/Neural Networks, DSS 9.2-9.6, ESL 11.
No. Text Chapters/Sections Questions Due 1. DSS Ch. 2.1.15 (1) 3; (2) 4; (3) 5; (4) 6 8/23 2. DSS Ch. 2.2.9 (5) 2; (6) 3 8/30 3. ISL Ch. 2 (7) 2; (8) 9 9/6 4. ISL Ch. 3 (9) 5; (10) 8 9/13 5. ISL Ch. 3 (11) 9; (12) 14 . 6. ISL Ch. 4 (13) 6; (14) 11 . 7. ISL Ch. 5 (15) 3, (16) 5 . 8. ISL Ch. 6 (17) 4, (18) 8 . 9. ISL Ch. 8 (19) 3; (20) 8 . 10. ISL Ch. 9 (21) 3; (22) 7 .
No. Topic Description Talk Due 1. Scala Scala, SBT, ScalaTion . 8/21 2. R R Language . 8/28 3. Spark Apache Spark . 9/4 4. MLlib Apache Spark MLlib . 9/11 5. TensorFlow TensorFlow Machine Learning Library . 9/18 6. Keras The Python Deep Learning Library . 10/4 7. Kaggle Kaggle is the place to do data science projects (source for final project) . . 7. Weka Waikato Environment for Knowledge Analysis (Weka) . . 8. Watson Watson Analytics . . 9. Google Cloud AI Google Cloud AI: Fast, large scale, and easy-to-use AI services. . .
Submit projects to the TA by sending a zip file containing all files to the TA's email with the subject line "[3360] Group # Project #". One submission per group will be sufficient. Turn in your fully commented source code files, an SBT build.sbt file and a ReadMe file. The ReadMe file must contain instructions for compiling and running the program as well as a detailed explanation of who coded what parts of the program. Ten percent of the grade is determined by how well documented the project is (all interfaces, classes, fields, constructors, methods and parameters must be documented). Another ten percent of the grade is determined by how well the effort is divided between the two group members.
No. Description Starter Code (must be used) Comment Due 1. Regression Problem in ScalaTion, R and Spark MLlib TBA Multiple Linear Regression using the TBA dataset 9/21 2. Classification Problem in ScalaTion and R TBA Compare Several Classification Algorithms . 3. Neural Networks in ScalaTion and Keras TBA A Regression Problem and a Classification Problem. . 4. Term Project: Data Science Application TBA A two-page proposal giving a detailed description of the application you propose to develop must be submitted with project 3. Project includes data collection, data analytics, interpretation and recommendations for a real-world project. May use approved tools. The term project including a demo will be presented during the last week of class. Worth twice the points of regular projects. .
Points: One third for presentation and two thirds for submission. Submit presentation slides including the above items (in pdf) and source code by midnight, November, 30, 2018.