Skip to content
Change the repository type filter

All

    Repositories list

    • Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
      Jupyter Notebook
      0100Updated Nov 18, 2024Nov 18, 2024
    • agentdojo

      Public
      A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
      Jupyter Notebook
      MIT License
      106213Updated Nov 12, 2024Nov 12, 2024
    • Blind-MIA

      Public
      This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
      Python
      0000Updated Oct 8, 2024Oct 8, 2024
    • Python
      31210Updated Oct 6, 2024Oct 6, 2024
    • Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
      Python
      MIT License
      0100Updated Sep 9, 2024Sep 9, 2024
    • .github

      Public
      0000Updated Jul 5, 2024Jul 5, 2024
    • Python
      MIT License
      03110Updated Jun 19, 2024Jun 19, 2024
    • llm_lab

      Public
      Python
      0000Updated Jun 17, 2024Jun 17, 2024
    • Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
      Python
      Apache License 2.0
      910710Updated Jun 13, 2024Jun 13, 2024
    • Python
      0000Updated Jun 13, 2024Jun 13, 2024
    • Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
      Jupyter Notebook
      2700Updated Apr 29, 2024Apr 29, 2024
    • Playing around with the CC3M data
      Python
      0000Updated Apr 29, 2024Apr 29, 2024
    • Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
      Python
      Apache License 2.0
      84140Updated Apr 24, 2024Apr 24, 2024
    • Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
      Python
      MIT License
      01900Updated Apr 15, 2024Apr 15, 2024
    • Data for "Quantifying Memorization Across Neural Language Models"
      Apache License 2.0
      0720Updated Mar 26, 2024Mar 26, 2024
    • Code used to run the platform for the LLM CTF colocated with SaTML 2024
      Python
      MIT License
      62500Updated Mar 20, 2024Mar 20, 2024
    • Python
      0100Updated Nov 14, 2023Nov 14, 2023
    • Python
      MIT License
      22800Updated Jun 19, 2023Jun 19, 2023
    • privacy

      Public
      Library for training machine learning models with privacy for training data
      Python
      Apache License 2.0
      453000Updated Jun 13, 2023Jun 13, 2023
    • Certified robustness "for free" using off-the-shelf diffusion models and classifiers
      Python
      MIT License
      53630Updated May 25, 2023May 25, 2023
    • Datasets for the SATML 2023 competition on training data extraction
      Apache License 2.0
      0510Updated Aug 24, 2022Aug 24, 2022