CSCI 4380/6380: Data Mining (Spring 2023)
Course Information
Instructor: Dr. Ninghao Liu
Course time and location:
TR: 3:55 pm  5:10 pm, Miller Plant Sci 2102
W: 4:10 pm  5:00 pm, Forest Resources1 0304
Office hours: Thursday, 11:00 am  11:59 am
Office: Boyd 616
TA: TBD
Course Description
The goal of this course is deriving a comprehensive understanding of fundamental issues, techniques, applications and future directions of data science and data mining. This course presents a rigorous overview of methods for machine learning, dimension reduction, modeling methods for tabular data, texts and graphs, and industry applications including outlier detection and recommender systems.
Textbooks
Data mining is a highly interdisciplinary and fastgrowing field, especially driven by the recent advances of machine learning and deep learning. We will heavily rely on course slides in class.
In case students are interested, the textbooks (not required) for this course are:
“Data Mining: Concepts and Techniques, 3rd edition” by Jiawei Han, Micheline Kamber, Jian Pei.
“Learning From Data” by Yaser S.AbuMostafa.
“Introduction to Information Retrieval” by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.
Course Prerequisite (Important)
Students are expected to have a working knowledge of Python. All programming assignments must be completed using Python unless it is specified otherwise. Preliminary knowledge of calculus, statistics and linear algebra are required.
Grading
Letter Grade  A  A  B  B  B  C  C  C  D  F 
Range  [90, 100]  [87, 90)  [84, 87)  [80, 84)  [77, 80)  [74, 77)  [70, 74)  [67, 70)  [60, 67)  [0, 60) 

Late Submission Policy: For homework assignments, 20% is deducted for each late day for up to 48 hours (including weekends) after which submissions are not accepted. Late presentation materials and project reports not accepted.
Exams: Exams are opennotes.
Academic Honesty
We will strictly follow UGA鈥檚 Academic Honesty Policy. Dishonest behavior will not be tolerated and may result into failing the course. Please contact the instructor if you have any concerns regarding this issue.
Course Schedule (Tentative)
Week  Date  Topic  Notes 
1  01/10  Course Overview  
 01/11  Classification: kNN  
 01/12  Classification: Linear models  
2  01/17  Classification: Linear models  HW1 out 
 01/18  Classification: Multiclass Classification  
 01/19  Classification: Evaluation  
3  01/24  Tabular data mining  
 01/25  Tabular data mining  
 01/26  Text mining: Preliminaries  Form Teams for Project 
4  01/31  Text mining: Vector space model  
 02/01  Text mining: Vector space model  
 02/02  Graph mining: Preliminaries  
5  02/07  Graph mining  HW1 due, HW2 out 
 02/08  Graph mining  
 02/09  Machine learning: Overfitting and Regularization  
6  02/14  Machine learning: Overfitting and Regularization  
 02/15  Classification: Naive Bayes classifiers  
 02/16  Classification: Naive Bayes classifiers  
7  02/21  Classification: Decision Tree  
 02/22  Classification: Decision Tree  HW2 due, HW3 out 
 02/23  Clustering  
8  02/28  Clustering  
 03/01  Clustering evaluation  
 03/02  Midterm Exam  
9  03/07    Spring Break. No class. 
 03/08    Spring Break. No class. 
 03/09    Spring Break. No class. 
10  03/14  Applications: Outlier detection  
 03/15  Applications: Outlier detection  
 03/16  Applications: Recommender systems  
11  03/21  Applications: Recommender systems  
 03/22  Applications: Recommender systems evaluation  
 03/23  Text mining: Embedding  
12  03/28  Text mining: Embedding  
 03/29  Text mining: Attention Mechanism  HW3 due, HW4 out 
 03/30  Text mining: Attention Mechanism  
13  04/04  Graph mining: GNN  
 04/05  Graph mining: GNN  
 04/06  Graph mining: GNN  
14  04/11  Model interpretation  
 04/12  Model interpretation  
 04/13  Model robustness  
15  04/18  Model robustness  
 04/19  Model fairness  HW4 due 
 04/20  Model fairness  
16  04/25  Project presentation  
 04/26  Project presentation  
 04/27  Project presentation  
18  05/09  Final Exam  

