CSCI 8265: Trustworthy Machine Learning (Fall 2024)

Course Information

  • Instructor: Dr. Ninghao Liu

  • Course time and location:

    • TR: 2:20pm - 3:35pm (Boyd 303)

    • W: 3:00pm - 3:50pm (Boyd 303)

  • Office hours: After each Tuesday class, or by email appointment

  • Office: Boyd 616

  • TA: N/A

Course Description

A research-oriented course which will introduce the technologies of building trustworthy machine learning systems. Topics include, but are not limited to, interpretable machine learning, machine learning security, and fairness of machine learning. Recent advances of Large Foundation Models will also be introduced.

Textbooks

Research papers, conference tutorials, and online materials are more useful than textbooks!

In case students are interested, I recommend this online book for interpretable machine learning:

“Interpretable Machine Learning - A Guide for Making Black Box Models Explainable” by Christoph Molnar.

Course Prerequisite (Important!)

CSCI 4360/6360, CSCI 4380/6380, or other data mining/machine learning related courses.

Grading

Letter Grade A A- B+ B B- C+ C C- D F
Range [90, 100] [87, 90) [84, 87) [80, 84) [77, 80) [74, 77) [70, 74) [67, 70) [60, 67) [0, 60)

Late Submission Policy: For paper reviews, 20% is deducted for each late day for up to 48 hours (including weekends) after which submissions are not accepted. Late presentation materials and project reports not accepted.

Exams: Preliminary exam (sample and solution). No Midterm or Final Exams.

Academic Honesty

We will strictly follow UGA’s Academic Honesty Policy. Dishonest behavior will not be tolerated and may result into failing the course. Please contact the instructor if you have any concerns regarding this issue.

Course Schedule (Tentative)

Week Date Topic Notes
1 08/14 Course Overview
08/15 Multi-layer perceptrons
2 08/20 Preliminary Exam
08/21 Multi-layer perceptrons
08/22 Student Introduction and Discussion
3 08/27 LIME
08/28 Gradient-based Explanation
08/29 Shapley Value
4 09/03 Convolutional Neural Networks
09/04 Convolutional Neural Networks
09/05 Guest presentation
5 09/10 Word embeddings and NLP
09/11 Word embeddings and NLP
09/12 Word embeddings and NLP
6 09/17 Large Language Models
09/18 Large Language Models
09/19 Paper presentation 1, 2, 3
7 09/24 Paper presentation 4, 5, 6
09/25 Graph models
09/26 Graph models
8 10/01 Paper presentation 7, 8, 9
10/02 Paper presentation 10, 11
10/03 Paper presentation 12, 13, 14
9 10/08 Proposal presentation
10/09 Proposal presentation
10/10 Adversarial attacks and defenses
10 10/15 Adversarial attacks and defenses
10/16 Backdoor attacks and defenses
10/17 Paper presentation 15, 16, 17
11 10/22 Paper presentation 18, 19, 20
10/23 Paper presentation 21, 22
10/24 ML Privacy
12 10/29 Paper presentation 23, 24, 25
10/30 ML Fairness
10/31 ML Fairness
13 11/05 Paper presentation 26, 27, 28
11/06 Paper presentation 29, 30
11/07 Paper presentation 31, 32, 33
14 11/12 Paper presentation 34, 35, 36
11/13 Paper presentation 37, 38
11/14 Paper presentation 39, 40, 41
15 11/19 Project presentation
11/20 Project presentation
11/21 Project presentation
16 11/26 Project presentation
11/27 - Thanks giving. No class.
11/28 - Thanks giving. No class.
17 12/03 No class.

Paper List (Tentative)

Model Interpretation

1. Learning deep features for discriminative localization. Zhou, Bolei, et al., 2016.

2. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Simonyan, Karen, et al., 2013.

3**. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju, Ramprasaath, et al., 2017.

4. Interpretable explanations of black boxes by meaningful perturbation. Fong, Ruth, et al., 2017.

5. Concept Bottleneck Models. Koh, Pang Wei, et al., 2020.

6. Network dissection: Quantifying interpretability of deep visual representations. Bau, David, et al., 2017.

7. Extraction of salient sentences from labelled documents. Denil, Misha, et al., 2015.

8**. Interpreting Word Embeddings with Eigenvector Analysis. Shin, Jamin, et al., 2018.

9**. On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Du, Mengnan, et al., 2019.

10**. Attention is not Explanation. Jain, Sarthak, et al., 2019.

11. Attention is not not explanation. Wiegreffe, Sarah, et al., 2019.

12. Is attention interpretable? Serrano, Sofia, et al., 2019.

13. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models. Wu, Tongshuang, et al., 2021.

14. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). Kim, Been, et al., 2018.

15. Learning Credible Deep Neural Networks with Rationale Regularization. Du, Mengnan, et al., 2019.

16**. Knowledge Neurons in Pretrained Transformers. Dai, Damai, et al., 2021.

17. SelfIE: Self-Interpretation of Large Language Model Embeddings. 2024.

18. An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs. Daking Rai, Ziyu Yao. 2024.

19**. From language modeling to instruction following: Understanding the behavior shift in llms after instruction tuning. Xuansheng Wu, et al., 2024.

20**. GNNExplainer: Generating Explanations for Graph Neural Networks. Ying, Rex, et al., 2019.

21. DEGREE: Decomposition Based Explanation for Graph Neural Networks. Feng, Qizhang, et al., 2022.

22. Parameterized explainer for graph neural network. Dongsheng Luo, et al., 2020.

Model Security

23. Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors. Wu, Zuxuan, et al., 2020.

24. Word-level Textual Adversarial Attacking as Combinatorial Optimization. Zhang, Yuan, et al., 2019.

25**. Semantically Equivalent Adversarial Rules for Debugging NLP Models. Ribeiro, Marco Tulio, et al., 2018.

26. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. Papernot, Nicolas, et al., 2016.

27. Jailbreaking Black Box Large Language Models in Twenty Queries. Patrick Chao. 2024.

28**. Jailbroken: How Does LLM Safety Training Fail? 2023.

29. Clean-Label Backdoor Attacks. Turner, Alexander, et al., 2020.

30**. Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Li, Yige, et al., 2021.

31. Removing backdoor-based watermarks in neural networks with limited data. Liu, Xuankai, et al., 2020.

32. Stealing Machine Learning Models via Prediction APIs. Tramer, Florian, et al., 2016.

33. Information Leakage in Embedding Models. Song, Congzheng, et al., 2020.

Model Fairness

34**. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. Zhao, Jieyu, et al., 2017.

35. Learning Gender-Neutral Word Embeddings. Zhao, Jieyu, et al., 2018.

36. FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders. Cheng, Pengyu, et al., 2021.

37. This Land is {Your, My} Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes. 2024.

38. Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications. 2024

39**. CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? 2024.

40**. Aligning as Debiasing: Causality-Aware Alignment via Reinforcement Learning with Interventional Feedback. 2024.

41. Safer-Instruct: Aligning Language Models with Automated Preference Data. 2024.