SPY Lab

All

21 repositories

non-adversarial-reproduction
Public
Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
Jupyter Notebook
•0•1•0•0•Updated Nov 18, 2024Nov 18, 2024
agentdojo
Public
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Jupyter Notebook
•
MIT License
•10•62•1•3•Updated Nov 12, 2024Nov 12, 2024
Blind-MIA
Public
This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
Python
•0•0•0•0•Updated Oct 8, 2024Oct 8, 2024
unlearning-vs-safety
Public
Python
•3•12•1•0•Updated Oct 6, 2024Oct 6, 2024
vmi-retreat-workshop-2024
Public
Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
Python
•
MIT License
•0•1•0•0•Updated Sep 9, 2024Sep 9, 2024
.github
Public
0•0•0•0•Updated Jul 5, 2024Jul 5, 2024
robust-style-mimicry
Public
Python
•
MIT License
•0•31•1•0•Updated Jun 19, 2024Jun 19, 2024
llm_lab
Public
Python
•0•0•0•0•Updated Jun 17, 2024Jun 17, 2024
rlhf_trojan_competition
Public
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Python
•
Apache License 2.0
•9•107•1•0•Updated Jun 13, 2024Jun 13, 2024
ctf-satml24-data-analysis
Public
Python
•0•0•0•0•Updated Jun 13, 2024Jun 13, 2024
misleading-privacy-evals
Public
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
Jupyter Notebook
•2•7•0•0•Updated Apr 29, 2024Apr 29, 2024
data-decay
Public
Playing around with the CC3M data
Python
•0•0•0•0•Updated Apr 29, 2024Apr 29, 2024
rlhf-poisoning
Public
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Python
•
Apache License 2.0
•8•41•4•0•Updated Apr 24, 2024Apr 24, 2024
realistic-adv-examples
Public
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
Python
•
MIT License
•0•19•0•0•Updated Apr 15, 2024Apr 15, 2024
lm_memorization_data
Public
Data for "Quantifying Memorization Across Neural Language Models"
Apache License 2.0
•0•7•2•0•Updated Mar 26, 2024Mar 26, 2024
satml-llm-ctf
Public
Code used to run the platform for the LLM CTF colocated with SaTML 2024
Python
•
MIT License
•6•25•0•0•Updated Mar 20, 2024Mar 20, 2024
infoseclab_23
Public
Python
•0•1•0•0•Updated Nov 14, 2023Nov 14, 2023
superhuman-ai-consistency
Public
Python
•
MIT License
•2•28•0•0•Updated Jun 19, 2023Jun 19, 2023
privacy
Public
Library for training machine learning models with privacy for training data
Python
•
Apache License 2.0
•453•0•0•0•Updated Jun 13, 2023Jun 13, 2023
diffusion_denoised_smoothing
Public
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
Python
•
MIT License
•5•36•3•0•Updated May 25, 2023May 25, 2023
lm-extraction-benchmark-data
Public
Datasets for the SATML 2023 competition on training data extraction
Apache License 2.0
•0•5•1•0•Updated Aug 24, 2022Aug 24, 2022