CSCI 4380/6380 Data Mining

CSCI 4380/6380 Data Mining

Fall 2017: Mondays 3:35pm - 4:25pm & Tuesdays and Thursdays 3:30pm - 4:45pm, Boyd GSRC Room 208

Instructor: Prof. Khaled Rasheed
Telephone: (706)542-0881
Office Hours: Monday 4:30-6:00pm and Thursday 2:00-3:00pm or by email appointment
Office Location: Room 111, Boyd GSRC
Email: khaled@uga.edu

Teaching Assistant: Roxana Attar
Office Hours: Wednesday 1:30pm-4:00pm
Office Location: Room 537, Boyd GSRC
Email: roxana.attar@uga.edu

Objectives:

The course aims to provide students with a broad introduction to the field of Data Mining and related areas and to teach students how to apply these methods to solve problems in complex domains. The course is appropriate both for students preparing for research in Data Mining and Machine Learning, as well as Bioinformatics, Science and Engineering students who want to apply Data Mining techniques to solve problems in their fields of study.

Recommended Background:

CSCI 2720 Data Structures. Familiarity with basic computer algorithms and data structures. Familiarity with the Java programming language is recommended but not required.

Topics to be Covered:

  • Part I: Data Mining techniques: Selected from: Association and Classification Rule Mining, Linear Models, Decision Trees and Random Forests, Neural Network approaches, Support Vector Machines, Bayesian Learning, Instance-based Learning, Pre-processing and Feature Selection, Performance evaluation, Ensemble Learning and clustering.
  • Part II: Data Mining applications: Selected from: Bioinformatics, Biomedical/Physical/Chemical modeling, medical diagnosis, text/web mining, pattern recognition and/or other contemporary applications.

    Expected Work:

    Reading; assignments (include running experiments using the Weka package); paper presentation, two midterms; and term project (may require programming or running existing packages) and paper. Unless otherwise announced by the instructor, all assignments and all exams must be done entirely on your own.

    Academic Honesty and Integrity:

    All academic work must meet the standards contained in "A Culture of Honesty." Students are responsible for informing themselves about those standards before performing any academic work. The penalties for academic dishonesty are severe and ignorance is not an acceptable defense.

    Grading Policy:

  • Assignments: 20% (Programs, homeworks, attendance, paper presentation)
  • Midterm Examinations: 40%
  • Term Project: 40% (includes term paper and presentation)
    Students may work on their term projects in groups of up to FOUR students each. The above distribution is only tentative and may change later. The instructor will announce any changes.

    Assignment Submission Policy

    Assignments must be turned in by the assigned deadline. Late assignments will not be accepted. Rare exceptions may be made by the instructor only under extenuating circumstances and in accordance with the university policies.

    Course Home-page

    A variety of materials will be made available on the DM Class Home-page at http://cobweb.cs.uga.edu/~khaled/DMcourse/, including handouts, lecture notes and assignments. Announcements may be posted between class meetings. You are responsible for being aware of whatever information is posted there.

    Lecture Notes

    Copies of some of Dr. Rasheed's lecture notes will be available at the bottom of the class home page. Not all the lectures will have electronic notes though and the students should be prepared to take notes inside the lecture at any time.

    Textbook in Bookstore

  • "Data Mining: Practical Machine Learning Tools and Techniques (4th edition)", Ian Witten, Eibe Frank , Mark Hall and Christopher Pal. Morgan Kaufmann, 2016. (Required)
    ISBN-10: 0128042915 & ISBN-13: 978-0128042915

    Web Resources

  • The WEKA Machine Learning Project
  • University of California at Irvine ML Repository
  • David Aha's Machine Learning Resources

    Announcements:

  • [11-15-2017] My office hour tomorrow will be from 1 to 2 pm instead of the regular time. I have to attend a defense at 2 pm.
  • [10-24-2017] The due date for Homework 4 is now next Tuesday, 10-31-2017. Also, please read Chapter 8 in the text book as it is related to the homework and has very useful information that we will not be able to cover in class. This is one of the reading assignments and we shall discuss it breifly next Monday.
  • [10-10-2017] The first midterm exam will be this Thursday 10-12-2017. It will cover all the topics discussed in the course till last Thursday (i.e. up to Chapter 5 Page 166). It will be open notes but the use of books, laptops or phones will not be allowed. You should bring a calculator to the exam; If you do not have a calculator you may use your phone as a calculator. You should also bring your lecture notes and all handouts and you may also bring any additional notes, homeworks etc.

    Papers

  • "Unsupervised feature selection for multi-cluster data" 2010. [Qinglin Dong][11/06] {download}
  • "Clustering by Passing Messages Between Data Points" 2007. [Joshwa Shannon][11/07] {download}
  • "Extending market basket analysis with graph mining techniques: A real case" 2014. [Zach Baker][11/07] {download}
  • "Text and Structural Data Mining of Influenza Mentions in Web and Social Media" 2010. [Amy Giuntini][11/09] {download}
  • "ImageNet classification with deep convolutional neural networks" 2017. [Zach Jones][11/09] {download}
  • "Authorship Verification for Short Messages using Stylometry" 2013. [Isela Diaz Martinez][11/09] {download}
  • "Feature Mining for Image Classification" 2014. [Hari Teja Tatavarti][11/13] {download}
  • "Emotional state classification from EEG data using machine learning approach" 2014. [Shulin Zhang][11/14] {download}
  • "Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction" 2011. [Christian McDaniel][11/16] {download}
  • "SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS" 2013. [Yuanming Shi][11/16] {download}
  • "Least squares support vector machines ensemble models for credit scoring" 2010. [?][?] {download}

    Assignments:

  • Homework 1: Exercise 17.1 on pages 559 - 565 of the Weka exercises handout given in class today. You can also download all the exercises from HERE. [Due 9-7-2017 in class]
  • Homework 2
  • Homework 3
  • Homework 4: Exercise 17.4 on page 574 of the Weka ecercises handout.[Due 10-31-2017 in class]
  • Homework 5
  • Homework 6

    Lecture Notes:

  • Chapter 1
  • Chapter 2
  • Chapter 3
  • Weka Tutorial Slides by Roxana Attar
  • Chapter 4
  • Chapter 5
  • Chapter 7
  • Chapter 12
    The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.

    Last modified: November 15, 2017.

    Khaled Rasheed (khaled[at]uga.edu)