Textbook (ScalaTion): Introduction to Computational Data Science Using ScalaTion, 2020.
John A. MillerIntroduction to Data Science Using ScalaTion: Lesson Plans, 2020.
John A. Miller
Textbook (ISL): An Introduction to Statistical Learning, 2013.
Gareth James, Daniela Witten, Trevor Hastie and Robert TibshiraniTextBook (ESL): The Elements of Statistical Learning, 2nd Ed., 2009.
Trevor Hastiei, Robert Tibshirani and Jerome FriedmanTextBook (MML): Mathematics for Machine Learning 2020.
Marc Peter Deisenroth, A Aldo Faisal and Cheng Soon Ong
R: Linear Regression Using R: An Introduction to Data Modeling, 2016.
David J. LiljaSpark: Machine Learning Library (MLlib) Guide, 2018.
Apache Spark
Day Time Plan Room Tuesday 12:45-2:00 Lecture + TT Miller Plant Science (1061) 2102 Wednesday 12:40-1:30 Lecture Boyd (1023) 208 Thursday 12:45-2:00 Lecture + HW Miller Plant Science (1061) 2102
An introduction to advanced analytics techniques in data science, including random forests, semi-supervised learning, spectral analytics, randomized algorithms, and just-in-time compilers. Distributed and out-of-core processing.
ScalaTion Chapters:
Part I Preliminaries
Part II Modeling
Part III Simulation
Part IV Appendices
ISL Chapters:
ESL Chapters:
Software:
20% Exam I 2/? 20% Exam II 4/? 25% Final Exam 5/? 30% Programs (groups of 3-4 students) see Software 5% Homework/Tool Talks [presentation]
Exam II
Review Date: Apr ?, 2023
Exam Date: Apr ?, 2023
5 Questions:
Final Exam
Exam Date: May ?, 2023
6 Questions:
See eLC.
See the TAs Updated List.
No. Topic Description Talk Due 1. R R Language . . 2. Spark Apache Spark . . 3. MLlib Apache Spark MLlib . . 4. TensorFlow TensorFlow Machine Learning Library . . 5. Keras The Python Deep Learning Library . . 6. Weka Waikato Environment for Knowledge Analysis (Weka) . . 7. Watson, Cognos Analytics Watson Analytics/Cognos Analytics . . 8. Parallel Matrix Operations GPU/TPU/FPGA . .
See eLC for the three regular projects and more details on the term project.
Project #1 on Ch. 6, Project #2 on Ch. 10, Project #3 TBD.
Projects
TA:
Submit projects to the TA following their instructions (one submission per group ).
Turn in your fully commented source code files, an SBT build.sbt file
and a ReadMe file.
The ReadMe file must contain instructions for compiling and running the program
as well as a detailed explanation of who coded what parts of the program.
Five percent of the grade is determined by how well the project is documented.
Another five percent of the grade is determined by how well the effort is divided
between the group members.
No. Description Starter Code (must be used) Comment Due
4.
Term Project: Data Science Application
.
A two-page proposal giving a detailed description of the application you propose to develop
must be submitted with Project 2.
Project includes data collection, data analytics, interpretation and recommendations
for a real-world project. May use ScalaTion, R, Scikit Learn, Keras, Spark or other approved software.
The term project including a 25-minute presentation and demo will be presented during the last week of class.
Must address the ten points/questions listed below.
Worth twice the points of regular projects.
.
Ten Questions:
Optional Project:
Policies