Welcome to CSCI 3360 Data Science! Data science is a rapidly growing field that combines traditional statistics, machine learning, data mining, and programming. It has been attracting a great deal of attentions from both academia and industry.
This course is designed as an introductory study of the theory and practice of data science. Data science is about learning from data to extract insight and knowledge. This course introduces computational and statistical tools used in data analysis to answer questions from data. To be specific, we will investigate on tools and methods for
Students are expected to have a working knowledge of Python 2.7 (or 3.5+). All programming assignments must be completed using Python unless it is specified otherwise. Some elementary knowledge of statistics, linear algebra, and probability theory are expected. Those fundamentals will be provided as they are needed.
The main textbook for this course is
“An Introduction to Statistical Learning with Applications in R”
by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
Springer (PDF is available online!)
Here’s a headup for you: there’s no single book that can cover all the topics of data science due to its interdiciplinary nature. Hence, we will not closely follow the structure of our main textbook. However, students are expected to read the relevant chapters of the book as the course proceeds.
Supplementary books
Read this section carefully, as it defines how your grade is determined. The following rules will be strictly enforced.
Evaluation will consist of 5 individual homeworks, exams (midterm and final), team project, and pop quizzes. Each submitted item (for example, homework, report, or presentation) will be graded without a curve.
Item | Portion | Description |
---|---|---|
Homework | 50% | 5 individual assigments involving problem solving, discussion, and python programming |
Exams | 25% | midterm (10%) and final (15%) Final exam is comprehensive. |
Team project | 20% | A semester long data science project - interim progress presentation (5%) - final report and presentation (15%) |
Quiz | 5% | Quizzes are comprehensive |
The grade will be given based on the total scores, a weighted sum of collected graded items. It is computed using the following equation:
\[ \begin{aligned} \text{total score} &= \frac{\sum_{i=1}^5 \text{HW}_i}{500}*50\% + \frac{\text{midterm} + \text{final}}{\text{200}}*25\% \\ &\qquad + \frac{\text{interim PR}}{100}*5\% + \frac{\text{final PR + report}}{200}*15\% \\ &\qquad + \frac{\sum_{i=1}^k \text{quiz}_i}{100k}*5\%\,. \end{aligned} \]
Note that the above equation assumes the maximum score for each graded item is 100, and this may differ from the actual assigned maximum.
Percentage | 95% | 90% | 85% | 80% | 75% | 70% | 65% | 60% | <60% |
---|---|---|---|---|---|---|---|---|---|
Grade | A | A- | B+ | B | B- | C+ | C | C- | D |
Note that instructor may slighly adujust the percentage to account for score distribution.
All assignements are expected to be completed and sumitted to the eLC by due date. Normally, assignments are due by 11:59 pm on Fridays. Any assignment submitted after 00:01 am on the following day of due date will be considered late.
Late submission will be penalized by deducting 10% of total marks for the assignment for each day beyond the due time.
For all students enrolled in this course, it is assumed that they will abide by UGA’s academic honesty policy and procedures. Please refer to UGA’s A CULTURE OF HONESTY. All the linked documents in the url is a part of this syllabus.
For every individual assignment, students are welcome to discuss the problems and share ideas at high level. This means that you should not share anything concrete such as write-up or code fragments. The submitted item must be a work of yours. For example, you can discuss how to solve a homework problem and share an idea, but you have to write your own answer. An egregious violation of these academic honesty codes will results in F for the course.
Week | Topic | Note |
---|---|---|
Jan. 4 | Course overview | HW0 OUT |
Jan 9. |
Warming up: review
|
|
Jan 16 |
Data collection: web scraping and pandas pandas tutorial |
|
Jan 23 |
Data visualization: matplotlib, seaborn matplotlib tutorial |
HW1 OUT |
Week | Topic | Note |
---|---|---|
Jan. 30 |
Regression (ISLR Ch. 3.1-3.3)
|
|
Feb 6. |
Classification
|
HW2 OUT |
Feb 13 |
More on classification
|
|
Feb 20 |
Feature engineering:
|
HW3 OUT |
Feb 27 |
Dimensionality reduction (ISLR Ch. 10.1-10.2)
|
Midterm |
Mar 6 |
Digression (BV Ch. 9.1-9.3)
|
HW4 OUT |
Mar 20 |
Unsupervised learning: data clustering (PRML Ch. 9)
|
|
Mar 27 |
Ensemble methods
|
Week | Topic | Note |
---|---|---|
Apr. 3 |
Recommender System
|
HW5 OUT |
Apr. 10 |
Sentiment Analysis
|
|
Apr 17 |
Introduction to deep learning
|
|
Apr 24 | Team project presentation | Final exam |