# CSCI 8960 Privacy-Preserving Data Analysis

## 1 Course info.

These days almost every single human activity is being monitored and stored electronically. The benefits of analyzing such data is well-known but there is an obstacle that prohibits the analysis, concerns about privacy of individuals in the data. In this course, we will study a mathematical framework and algorithmic techniques for privacy-preserving analysis, which enables analyzing sensitive data. Especially, our focus will be on designing privacy-preserving algorithms for popular machine learning tasks such as regression, prediction, recommendation, and so on.

- Instructor : Jaewoo Lee
- Email : jaewoo.lee@uga.edu
- Office : BOYD 620
- Office hours : Tue. 2:30 - 3:30 pm (or by appointment)

### 1.1 Prerequisite

Some elementary knowledge of statistics, linear algebra, and probability theory are expected, but not **REQUIRED**. Those fundamentals will be provided as they are needed. Familiarity with machine learning (CSCI 8950) or data mining (CSCI 4380/6380) will be helpful.

## 2 Course materials

Our main textbook is

- Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth.

Here are additional books that I recommend:

## 3 Evaluation criteria

### 3.1 Grading proportion

Portion | Description | |
---|---|---|

Homework | 30% | 3 individual assignments |

Team project | 50% | implementation of data analysis program |

Paper presentation | 15% | paper reading and presentation |

Extra | 5% | class participation |

### 3.2 Grading scale^{1}

Each submitted item (for example, homework, report, or presentation) will be graded on a 100 point scale and then the numeric score may be *curved* to get a more reasonable grade distribution. In other words, *rank* is more important metric than the score on the graded item.

Grade | Percentage |
---|---|

A | [80, 100) |

A- | [70, 80) |

B+ | [50, 70) |

## 4 Academic Honesty

For all students enrolled in this course, it is assumed that they will abide by UGA's academic honesty policy and procedures. Please refer to UGA's A CULTURE OF HONESTY.

For every individual assignment, students are welcome to discuss the problems and share ideas (at high level), but **the submitted item must be a work of yours**. For example, you can discuss how to solve a homework problem and share an idea, but you have to write your own answer.

- Type your homework (\(\LaTeX\) is recommented).
**Do not write (and submit) something you don't understand or can't explain.**- Do not provide or make your answer available to others (no matter whether they are enrolled in or not)
- If you can't meet the submission deadline due to a illness, first inform the instructor by email and attach a doctor's note.

## 5 Tentative Schedule

Date | Topic | Note |
---|---|---|

Part I. Fundamentals of Data Privacy | ||

8/15 | Overview: privacy definitions | |

\(k\)-anonymity, l-diversity | ||

Review: probability and distributions | ||

8/22 | Differential privacy: definition | |

Useful properties: composition, post-processing | ||

8/29 | Basic tools: Laplace mechanism | |

Gaussian mechanism | ||

9/5 | Differentially private selection | HW 1 out |

NoisyMax, Exponential mechainsm, One-sided exponential | ||

9/12 | Local privacy model: randomized response | |

Part II. Privacy-Preserving Machine Learning | ||

9/19 | Private Linear regression | |

9/26 | Empirical Risk Minimization | HW 2 out |

10/3 | Paper presentation: Group 1 | |

10/10 | Paper presentation: Group 2 | |

10/17 | Private Convex Empirical Risk Minimization | |

Postprocessing with ADMM | ||

10/24 | Stochastic Gradient Descent, advanced composition theorem | HW 3 out |

10/31 | PrivGene: private genetic algorithm | |

Bayesian Posterior Sampling and Privacy | ||

Part III. Deep Learning with Differential Privacy | ||

11/7 | Multilayer Perceptrons, DP-SGD | |

11/14 | Generative models: GANs and Variational Autoencoders | |

11/21 | Project presentation: Group 1 | |

12/28 | Project presentation: Group 2 |

## Footnotes:

^{1}

subject to change