CSCI 8265: Trustworthy Machine Learning (Fall 2024)
Course Information
Instructor: Dr. Ninghao Liu
Course time and location:
TR: 2:20pm - 3:35pm (Boyd 303)
W: 3:00pm - 3:50pm (Boyd 303)
Office hours: After each Tuesday class, or by email appointment
Office: Boyd 616
TA: N/A
Course Description
A research-oriented course which will introduce the technologies of building trustworthy machine learning systems. Topics include, but are not limited to, interpretable machine learning, machine learning security, and fairness of machine learning. Recent advances of Large Foundation Models will also be introduced.
Textbooks
Research papers, conference tutorials, and online materials are more useful than textbooks!
In case students are interested, I recommend this online book for interpretable machine learning:
“Interpretable Machine Learning - A Guide for Making Black Box Models Explainable” by Christoph Molnar.
Course Prerequisite (Important!)
CSCI 4360/6360, CSCI 4380/6380, or other data mining/machine learning related courses.
Grading
Letter Grade | A | A- | B+ | B | B- | C+ | C | C- | D | F |
Range | [90, 100] | [87, 90) | [84, 87) | [80, 84) | [77, 80) | [74, 77) | [70, 74) | [67, 70) | [60, 67) | [0, 60) |
Late Submission Policy: For paper reviews, 20% is deducted for each late day for up to 48 hours (including weekends) after which submissions are not accepted. Late presentation materials and project reports not accepted.
Exams: Preliminary exam (sample and solution). No Midterm or Final Exams.
Academic Honesty
We will strictly follow UGA’s Academic Honesty Policy. Dishonest behavior will not be tolerated and may result into failing the course. Please contact the instructor if you have any concerns regarding this issue.
Course Schedule (Tentative)
Week | Date | Topic | Notes |
1 | 08/14 | Course Overview | |
08/15 | Multi-layer perceptrons | ||
2 | 08/20 | Preliminary Exam | |
08/21 | Multi-layer perceptrons | ||
08/22 | Student Introduction and Discussion | ||
3 | 08/27 | LIME | |
08/28 | Gradient-based Explanation | ||
08/29 | Shapley Value | ||
4 | 09/03 | Convolutional Neural Networks | |
09/04 | Convolutional Neural Networks | ||
09/05 | Guest presentation | ||
5 | 09/10 | Word embeddings and NLP | |
09/11 | Word embeddings and NLP | ||
09/12 | Word embeddings and NLP | ||
6 | 09/17 | Large Language Models | |
09/18 | Large Language Models | ||
09/19 | Paper presentation | 1, 2, 3 | |
7 | 09/24 | Paper presentation | 4, 5, 6 |
09/25 | Graph models | ||
09/26 | Graph models | ||
8 | 10/01 | Paper presentation | 7, 8, 9 |
10/02 | Paper presentation | 10, 11 | |
10/03 | Paper presentation | 12, 13, 14 | |
9 | 10/08 | Proposal presentation | |
10/09 | Proposal presentation | ||
10/10 | Adversarial attacks and defenses | ||
10 | 10/15 | Adversarial attacks and defenses | |
10/16 | Backdoor attacks and defenses | ||
10/17 | Paper presentation | 15, 16, 17 | |
11 | 10/22 | Paper presentation | 18, 19, 20 |
10/23 | Paper presentation | 21, 22 | |
10/24 | ML Privacy | ||
12 | 10/29 | Paper presentation | 23, 24, 25 |
10/30 | ML Fairness | ||
10/31 | ML Fairness | ||
13 | 11/05 | Paper presentation | 26, 27, 28 |
11/06 | Paper presentation | 29, 30 | |
11/07 | Paper presentation | 31, 32, 33 | |
14 | 11/12 | Paper presentation | 34, 35, 36 |
11/13 | Paper presentation | 37, 38 | |
11/14 | Paper presentation | 39, 40, 41 | |
15 | 11/19 | Project presentation | |
11/20 | Project presentation | ||
11/21 | Project presentation | ||
16 | 11/26 | Project presentation | |
11/27 | - | Thanks giving. No class. | |
11/28 | - | Thanks giving. No class. | |
17 | 12/03 | No class. | |
Paper List (Tentative)
Model Interpretation
1. Learning deep features for discriminative localization. Zhou, Bolei, et al., 2016.
2. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Simonyan, Karen, et al., 2013.
3**. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju, Ramprasaath, et al., 2017.
4. Interpretable explanations of black boxes by meaningful perturbation. Fong, Ruth, et al., 2017.
5. Concept Bottleneck Models. Koh, Pang Wei, et al., 2020.
6. Network dissection: Quantifying interpretability of deep visual representations. Bau, David, et al., 2017.
7. Extraction of salient sentences from labelled documents. Denil, Misha, et al., 2015.
8**. Interpreting Word Embeddings with Eigenvector Analysis. Shin, Jamin, et al., 2018.
9**. On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Du, Mengnan, et al., 2019.
10**. Attention is not Explanation. Jain, Sarthak, et al., 2019.
11. Attention is not not explanation. Wiegreffe, Sarah, et al., 2019.
12. Is attention interpretable? Serrano, Sofia, et al., 2019.
13. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models. Wu, Tongshuang, et al., 2021.
14. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). Kim, Been, et al., 2018.
15. Learning Credible Deep Neural Networks with Rationale Regularization. Du, Mengnan, et al., 2019.
16**. Knowledge Neurons in Pretrained Transformers. Dai, Damai, et al., 2021.
17. SelfIE: Self-Interpretation of Large Language Model Embeddings. 2024.
18. An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs. Daking Rai, Ziyu Yao. 2024.
19**. From language modeling to instruction following: Understanding the behavior shift in llms after instruction tuning. Xuansheng Wu, et al., 2024.
20**. GNNExplainer: Generating Explanations for Graph Neural Networks. Ying, Rex, et al., 2019.
21. DEGREE: Decomposition Based Explanation for Graph Neural Networks. Feng, Qizhang, et al., 2022.
22. Parameterized explainer for graph neural network. Dongsheng Luo, et al., 2020.
Model Security
23. Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors. Wu, Zuxuan, et al., 2020.
24. Word-level Textual Adversarial Attacking as Combinatorial Optimization. Zhang, Yuan, et al., 2019.
25**. Semantically Equivalent Adversarial Rules for Debugging NLP Models. Ribeiro, Marco Tulio, et al., 2018.
26. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. Papernot, Nicolas, et al., 2016.
27. Jailbreaking Black Box Large Language Models in Twenty Queries. Patrick Chao. 2024.
28**. Jailbroken: How Does LLM Safety Training Fail? 2023.
29. Clean-Label Backdoor Attacks. Turner, Alexander, et al., 2020.
30**. Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Li, Yige, et al., 2021.
31. Removing backdoor-based watermarks in neural networks with limited data. Liu, Xuankai, et al., 2020.
32. Stealing Machine Learning Models via Prediction APIs. Tramer, Florian, et al., 2016.
33. Information Leakage in Embedding Models. Song, Congzheng, et al., 2020.
Model Fairness
34**. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. Zhao, Jieyu, et al., 2017.
35. Learning Gender-Neutral Word Embeddings. Zhao, Jieyu, et al., 2018.
36. FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders. Cheng, Pengyu, et al., 2021.
37. This Land is {Your, My} Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes. 2024.
38. Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications. 2024
39**. CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? 2024.
40**. Aligning as Debiasing: Causality-Aware Alignment via Reinforcement Learning with Interventional Feedback. 2024.
41. Safer-Instruct: Aligning Language Models with Automated Preference Data. 2024.