Skip to content
@ethz-spylab

SPY Lab

Secure and Private AI research at ETH Zürich

SPY Lab (ETH Zurich)

The Secure and Private AI (SPY) Lab conducts research on the security, privacy and trustworthiness of machine learning systems. We often approach these problems from an adversarial perspective, by designing attacks that probe the worst-case performance of a system to ultimately understand and improve its safety.

💡 Learn more about our work and read our publications on our website.

🖥️ Check the code for our projects in this repository.

Popular repositories Loading

  1. rlhf_trojan_competition rlhf_trojan_competition Public

    Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.

    Python 107 9

  2. agentdojo agentdojo Public

    A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

    Jupyter Notebook 61 10

  3. rlhf-poisoning rlhf-poisoning Public

    Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"

    Python 41 8

  4. diffusion_denoised_smoothing diffusion_denoised_smoothing Public

    Certified robustness "for free" using off-the-shelf diffusion models and classifiers

    Python 36 5

  5. robust-style-mimicry robust-style-mimicry Public

    Python 31

  6. superhuman-ai-consistency superhuman-ai-consistency Public

    Python 28 2

Repositories

Showing 10 of 21 repositories
  • non-adversarial-reproduction Public

    Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)

    ethz-spylab/non-adversarial-reproduction’s past year of commit activity
    Jupyter Notebook 1 0 0 0 Updated Nov 18, 2024
  • agentdojo Public

    A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

    ethz-spylab/agentdojo’s past year of commit activity
    Jupyter Notebook 61 MIT 10 1 3 Updated Nov 12, 2024
  • Blind-MIA Public

    This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models

    ethz-spylab/Blind-MIA’s past year of commit activity
    Python 0 0 0 0 Updated Oct 8, 2024
  • ethz-spylab/unlearning-vs-safety’s past year of commit activity
    Python 12 3 1 0 Updated Oct 6, 2024
  • vmi-retreat-workshop-2024 Public

    Repository for the VMI Summer Retreat Workshop on Hacking AI Agents

    ethz-spylab/vmi-retreat-workshop-2024’s past year of commit activity
    Python 1 MIT 0 0 0 Updated Sep 9, 2024
  • .github Public
    ethz-spylab/.github’s past year of commit activity
    0 0 0 0 Updated Jul 5, 2024
  • ethz-spylab/robust-style-mimicry’s past year of commit activity
    Python 31 MIT 0 1 0 Updated Jun 19, 2024
  • llm_lab Public
    ethz-spylab/llm_lab’s past year of commit activity
    Python 0 0 0 0 Updated Jun 17, 2024
  • rlhf_trojan_competition Public

    Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.

    ethz-spylab/rlhf_trojan_competition’s past year of commit activity
    Python 107 Apache-2.0 9 1 0 Updated Jun 13, 2024
  • ethz-spylab/ctf-satml24-data-analysis’s past year of commit activity
    Python 0 0 0 0 Updated Jun 13, 2024

Most used topics

Loading…