# CSCI 8960 Privacy-Preserving Data Analysis

## 1 Course info.

These days almost every single human activity is being monitored and stored electronically. The benefits of analyzing such data is well-known but there is an obstacle that prohibits the analysis, concerns about privacy of individuals in the data. In this course, we will study a mathematical framework and algorithmic techniques for privacy-preserving analysis, which enables analyzing sensitive data. Especially, our focus will be on designing privacy-preserving algorithms for popular machine learning tasks such as regression, prediction, recommendation, and so on.

• Instructor : Jaewoo Lee
• Email : jaewoo.lee@uga.edu
• Office : BOYD 620
• Office hours : Tue. 2:30 - 3:30 pm (or by appointment)

### 1.1 Prerequisite

Some elementary knowledge of statistics, linear algebra, and probability theory are expected, but not REQUIRED. Those fundamentals will be provided as they are needed. Familiarity with machine learning (CSCI 8950) or data mining (CSCI 4380/6380) will be helpful.

## 2 Course materials

Our main textbook is

• Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth.

Here are additional books that I recommend:

• Elements of statistical learning by Trevor Hastie et al. PDF
• Pattern recognition and Machine learning by Chirstopher M. Bishop
• Convex optimization by Boyd and Vandenberghe PDF
• Probability and measure theory by Robert B. Ash and Catherine A. Doleans-Dade

## 3 Evaluation criteria

Portion Description
Homework 30% 3 individual assignments
Team project 50% implementation of data analysis program
Paper presentation 15% paper reading and presentation
Extra 5% class participation

Each submitted item (for example, homework, report, or presentation) will be graded on a 100 point scale and then the numeric score may be curved to get a more reasonable grade distribution. In other words, rank is more important metric than the score on the graded item.

A [80, 100)
A- [70, 80)
B+ [50, 70)

For all students enrolled in this course, it is assumed that they will abide by UGA's academic honesty policy and procedures. Please refer to UGA's A CULTURE OF HONESTY.

For every individual assignment, students are welcome to discuss the problems and share ideas (at high level), but the submitted item must be a work of yours. For example, you can discuss how to solve a homework problem and share an idea, but you have to write your own answer.

• Type your homework ($$\LaTeX$$ is recommented).
• Do not write (and submit) something you don't understand or can't explain.
• Do not provide or make your answer available to others (no matter whether they are enrolled in or not)
• If you can't meet the submission deadline due to a illness, first inform the instructor by email and attach a doctor's note.

## 5 Tentative Schedule

Date Topic Note
Part I. Fundamentals of Data Privacy
8/15 Overview: privacy definitions
$$k$$-anonymity, l-diversity
Review: probability and distributions
8/22 Differential privacy: definition
Useful properties: composition, post-processing
8/29 Basic tools: Laplace mechanism
Gaussian mechanism
9/5 Differentially private selection HW 1 out
NoisyMax, Exponential mechainsm, One-sided exponential
9/12 Local privacy model: randomized response
Part II. Privacy-Preserving Machine Learning
9/19 Private Linear regression
9/26 Empirical Risk Minimization HW 2 out
10/3 Paper presentation: Group 1
10/10 Paper presentation: Group 2
10/17 Private Convex Empirical Risk Minimization