CSCI 4380/6380: Data Mining (Spring 2023)

Course Information

  • Instructor: Dr. Ninghao Liu

  • Course time and location:

    • TR: 3:55 pm - 5:10 pm, Miller Plant Sci 2102

    • W: 4:10 pm - 5:00 pm, Forest Resources-1 0304

  • Office hours: Thursday, 11:00 am - 11:59 am

  • Office: Boyd 616

  • TA: TBD

Course Description

The goal of this course is deriving a comprehensive understanding of fundamental issues, techniques, applications and future directions of data science and data mining. This course presents a rigorous overview of methods for machine learning, dimension reduction, modeling methods for tabular data, texts and graphs, and industry applications including outlier detection and recommender systems.

Textbooks

Data mining is a highly interdisciplinary and fast-growing field, especially driven by the recent advances of machine learning and deep learning. We will heavily rely on course slides in class.

In case students are interested, the textbooks (not required) for this course are:

Data Mining: Concepts and Techniques, 3rd edition” by Jiawei Han, Micheline Kamber, Jian Pei.

Learning From Data” by Yaser S.Abu-Mostafa.

Introduction to Information Retrieval” by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.

Course Prerequisite (Important)

Students are expected to have a working knowledge of Python. All programming assignments must be completed using Python unless it is specified otherwise. Preliminary knowledge of calculus, statistics and linear algebra are required.

Grading

Letter Grade A A- B B B- C C C- D F
Range [90, 100] [87, 90) [84, 87) [80, 84) [77, 80) [74, 77) [70, 74) [67, 70) [60, 67) [0, 60)

Late Submission Policy: For homework assignments, 20% is deducted for each late day for up to 48 hours (including weekends) after which submissions are not accepted. Late presentation materials and project reports not accepted.

Exams: Exams are open-notes.

Academic Honesty

We will strictly follow UGA鈥檚 Academic Honesty Policy. Dishonest behavior will not be tolerated and may result into failing the course. Please contact the instructor if you have any concerns regarding this issue.

Course Schedule (Tentative)

Week Date Topic Notes
1 01/10 Course Overview
01/11 Classification: kNN
01/12 Classification: Linear models
2 01/17 Classification: Linear models HW1 out
01/18 Classification: Multi-class Classification
01/19 Classification: Evaluation
3 01/24 Tabular data mining
01/25 Tabular data mining
01/26 Text mining: Preliminaries Form Teams for Project
4 01/31 Text mining: Vector space model
02/01 Text mining: Vector space model
02/02 Graph mining: Preliminaries
5 02/07 Graph mining HW1 due, HW2 out
02/08 Graph mining
02/09 Machine learning: Overfitting and Regularization
6 02/14 Machine learning: Overfitting and Regularization
02/15 Classification: Naive Bayes classifiers
02/16 Classification: Naive Bayes classifiers
7 02/21 Classification: Decision Tree
02/22 Classification: Decision Tree HW2 due, HW3 out
02/23 Clustering
8 02/28 Clustering
03/01 Clustering evaluation
03/02 Midterm Exam
9 03/07 - Spring Break. No class.
03/08 - Spring Break. No class.
03/09 - Spring Break. No class.
10 03/14 Applications: Outlier detection
03/15 Applications: Outlier detection
03/16 Applications: Recommender systems
11 03/21 Applications: Recommender systems
03/22 Applications: Recommender systems evaluation
03/23 Text mining: Embedding
12 03/28 Text mining: Embedding
03/29 Text mining: Attention Mechanism HW3 due, HW4 out
03/30 Text mining: Attention Mechanism
13 04/04 Graph mining: GNN
04/05 Graph mining: GNN
04/06 Graph mining: GNN
14 04/11 Model interpretation
04/12 Model interpretation
04/13 Model robustness
15 04/18 Model robustness
04/19 Model fairness HW4 due
04/20 Model fairness
16 04/25 Project presentation
04/26 Project presentation
04/27 Project presentation
18 05/09 Final Exam