CSCI 4380/6380 Data Mining
CSCI 4380/6380 Data Mining
Fall 2017: Mondays 3:35pm - 4:25pm & Tuesdays
and Thursdays 3:30pm - 4:45pm, Boyd GSRC Room 208
Instructor: Prof. Khaled
Office Hours: Monday 4:30-6:00pm and Thursday 2:00-3:00pm or by email
Office Location: Room 111, Boyd GSRC
Teaching Assistant: Roxana Attar
Office Hours: Wednesday 1:30pm-4:00pm
Office Location: Room 537, Boyd GSRC
The course aims to provide students with a broad introduction to the
field of Data Mining and related areas and to teach students how to
apply these methods to solve problems in complex domains.
The course is appropriate both for students preparing for research in
Data Mining and Machine Learning, as well as Bioinformatics, Science
and Engineering students who want to apply Data Mining techniques to
solve problems in their fields of study.
CSCI 2720 Data Structures. Familiarity with basic computer algorithms
and data structures. Familiarity with the Java programming language is
recommended but not required.
Topics to be Covered:
Part I: Data Mining techniques: Selected from: Association and
Classification Rule Mining, Linear Models, Decision Trees and Random
Forests, Neural Network approaches, Support Vector Machines, Bayesian
Learning, Instance-based Learning, Pre-processing and Feature
Selection, Performance evaluation, Ensemble Learning and clustering.
Part II: Data Mining applications: Selected from: Bioinformatics,
Biomedical/Physical/Chemical modeling, medical diagnosis, text/web
mining, pattern recognition and/or other contemporary applications.
Reading; assignments (include running experiments using the Weka
package); paper presentation, two midterms; and term project (may
require programming or running existing packages) and paper.
Unless otherwise announced by the instructor, all assignments and all
exams must be done entirely on your own.
Academic Honesty and Integrity:
All academic work must meet the standards contained in
"A Culture of Honesty." Students are responsible for informing
themselves about those standards before performing any academic
work. The penalties for academic dishonesty are severe and ignorance
is not an acceptable defense.
Assignments: 20% (Programs, homeworks, attendance, paper presentation)
Midterm Examinations: 40%
Term Project: 40% (includes term paper and presentation)
Students may work on their term projects in groups of up to
FOUR students each. The above distribution is only
tentative and may change later. The instructor will announce any
Assignment Submission Policy
Assignments must be turned in by the assigned deadline. Late
assignments will not be accepted. Rare exceptions may be made by the
instructor only under extenuating circumstances and in accordance with
the university policies.
A variety of materials will be made available on the DM Class
http://cobweb.cs.uga.edu/~khaled/DMcourse/, including handouts,
lecture notes and assignments. Announcements may be posted between
class meetings. You are responsible for being aware of whatever
information is posted there.
Copies of some of Dr. Rasheed's lecture notes will be
available at the bottom of the class home page. Not all the lectures
will have electronic notes though and the students should be prepared
to take notes inside the lecture at any time.
Textbook in Bookstore
"Data Mining: Practical Machine Learning Tools and Techniques
(4th edition)", Ian Witten, Eibe Frank , Mark Hall and Christopher Pal. Morgan Kaufmann,
ISBN-10: 0128042915 & ISBN-13: 978-0128042915
The WEKA Machine Learning Project
University of California at Irvine ML Repository
David Aha's Machine Learning Resources
Homework 1: Exercise 17.1 on pages 559 - 565 of the Weka exercises
handout given in class today. You can also download all the exercises
from HERE. [Due 9-7-2017 in class]
Weka Tutorial Slides by Roxana Attar
The course syllabus is a general plan for the course;
deviations announced to the class by the instructor may be
Last modified: September 19, 2017.