# CSCI 3360 Data Science

## 1 Course info.

Welcome to CSCI 3360 Data Science! Data science is a rapidly growing field that combines traditional statistics, machine learning, data mining, and programming. It has been attracting a great deal of attentions from both academia and industry. Also, data scientist is selected as the most promising job in the United States^{1}.

- Instructor : Jaewoo Lee
- Email : jaewoo.lee@uga.edu
- Office : BOYD 620
- Office hours : Wed. 12:10 pm - 1:10 pm, Thurs. 3 pm - 4 pm
- TA : Yang Shi (Mon. 9:00 am - 10 am, Tue. 2:15 pm to 3:15 pm), BOYD 307

## 2 Course description

This course is designed as an introductory study of the theory and practice of data science. Data science is about *learning from data* to extract insight and knowledge. This course introduces computational and statistical tools used in data analysis to answer questions from data. To be specific, we will investigate on tools and methods for

- data collection, data munging, cleaning
- data exploration, hypothesis testing
- statistical modeling
- making inference on data (regression, classification, and clustering)
- data visualization, and communication/interpretation of results.

### 2.1 Prerequisite

Students are expected to have a working knowledge of Python 2.7. All programming assignments must be completed using Python unless it is specified otherwise. Some elementary knowledge of statistics, linear algebra, and probability theory are expected, but not **REQUIRED**. Those fundamentals will be provided as they are needed.

### 2.2 Learning objectives

- Using Python, collect data from web and process the raw data into a form usable by data analysis algorithms.
- Summarize and visualize the data using statistical tools to quickly explore different aspects of complex data.
- Design a statistical experiment to test a hypothesis on data.
- Choose the most suitable statistical model for the given analysis task.
- Apply statistics and computational method (e.g., machine learning) to make predictions based on data.
- Implement (or modify) an analysis algorithm using python packages.
- Communicate with non-data science experts about analysis results, using effective statistics and visualizations.