Skip to content

Latest commit

 

History

History
588 lines (450 loc) · 48.5 KB

PREVIOUS.md

File metadata and controls

588 lines (450 loc) · 48.5 KB

Previous sessions

This page shows a history of previous sessions in the reading group.

Date Topic Room Lead
20/03/23 Introduction to word embeddings and language modelling (Slides) David Blackwell Fede Nanni
03/04/23 Deep Learning Basics (Slides) David Blackwell Phil Swatton, Jack Roberts
17/04/23 Sequence-to-sequence models part I: RNNs/LSTMs (Slides) David Blackwell Ryan Chan
03/05/23 Sequence-to-sequence models part II: Encoder-decoder models (Slides) David Blackwell Ryan Chan
15/05/23 Hands-on RNN/LSTM session (Materials) David Blackwell Nathan Simpson, Levan Bokeria, David Llewellyn-Jones
31/05/23 Reginald overview & Attention and self-attention networks (Notebook) David Blackwell Evelina Gabasova, Martin Stoffel
26/06/23 Attention (continued) (Slides) & Transformer Encoder and Decoders (Slides) David Blackwell Martin Stoffel, Ryan Chan
10/07/23 BERT: Masked Language modelling and Pre-training (Slides) David Blackwell Ryan Chan
24/07/23 GPT: Pretraining Decoders (Slides) David Blackwell Ryan Chan
07/08/23 Vision Transformers part I (Slides) David Blackwell Katie Awty-Carroll, Ryan Chan
21/08/23 Vision Transformers part II (Slides) David Blackwell Katie Awty-Carroll
18/09/23 LoRA (+ parameter efficient fine-tuning) part I (Slides) David Blackwell Jack Roberts
25/09/23 LoRA (+ parameter efficient fine-tuning) part II (Notebook) Margaret Hamilton Jack Roberts
02/10/23 Reinforcement Learning Human Feedback (RLHF) (Slides) David Blackwell Eseoghene Ben-Iwhiwhu
16/10/23 Prompt Engineering (Slides) David Blackwell Martin Stoffel
30/10/23 Knowledge retrieval (Slides) David Blackwell Praveen Selvaraj
06/11/23 Discussion: Current challenges and future directions in safety evaluations for generative AI (Slides) David Blackwell Jonathan Bright
13/11/23 Introduction to Diffusion models (Slides) David Blackwell Edmund Dable-Heath
20/11/23 Research at Turing: Transformers for coding/software engineering (Slides) Mae Jemison Anastasiia Grishina
04/12/23 Discussion: Best Practice for Responsible Foundation Models – What Should Developers Do and How You Can Help (Slides) Ursula Franklin Carolyn Ashurst
11/12/23 Stable Diffusion (Slides) David Blackwell Edmund Dable-Heath
08/01/24 Discussion: Benchmarking AI applications on GPUs (Slides) David Blackwell Tomas Lazauskas, David Llewellyn-Jones
15/01/24 Retentive Networks (Slides) David Blackwell Ed Gunn
22/01/24 Research at Turing: Spatial Graph Patterning of Filamentous Structures David Blackwell Kristina Ulicna
29/01/24 Vision Transformers Need Registers (Slides) David Blackwell Tom Davies
05/02/24 Discussion: Existential Risk of AI? (Slides) David Blackwell Levan Bokeria
12/02/24 Mechanistic interpretability (Slides) David Blackwell Praveen Selvaraj
19/02/24 Research at Turing: Longitudinal NLP (Slides) David Blackwell Jenny Chim, Talia Tseriotou
26/02/24 Research at Turing: Machine translation quality estimation (Slides) David Blackwell Radka Jersakova, Jo Knight
04/03/24 Discussion: Expanding participatory governance for LLMs: case studies from BigCode, Aya Initiative, and Collective Intelligence Project (Slides) David Blackwell Jennifer Ding
11/03/24 Research at Turing: Applying Vision Transformers in Neuroscience (Slides) David Blackwell Bryan Li
18/03/24 Research at Turing: Not even a Chinese Room: evaluating LLMs on code simulation David Blackwell Emanuele La Malfa
08/04/24 Paper overviews (Slides, Slides) Ursula Franklin Fede Nanni, Markus Hauru, Praveen Selvaraj
15/04/24 Research at Turing: Natural Logic-based Fact Verification with LLMs David Blackwell Marek Strong
22/04/24 Research at Turing: Learn how to learn and distil during learning - Using meta-learning and second order optimisation to prune the model David Blackwell Yilei Liang
29/04/24 Invited Talk: How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions (Slides) David Blackwell Lorenzo Pacchiardi
13/05/24 Overview of LLM Security (Slides) David Blackwell Ed Chapman, Burak Hasircioglu, Ezzeldin Zaki
20/05/24 KAN: Kolmogorov-Arnold Networks Ursula Franklin Andrew Duncan
04/06/24 Invited Talk: Are we ready for attacks on machine learning? Enigma (2.30pm) Nicholas Carlini
01/07/24 A perspective on the fundamentals of transformers (Slides) Ursula Franklin Ed Gunn
08/07/24 Invited Talk: Equally Safe Online? A participatory approach to tackling Gender-Based Violence (Slides) David Blackwell Gavin Abercrombie
15/07/24 Invited Talk: Open science projects for open-source models and transparent open datasets Cipher Christopher Klamm
22/07/24 Invited Talk: Designing a Value-driven GAI Framework for Social Good: Embedding Social Good Values into GAI Models (Slides) Ursula Franklin Victor OK Li, Jacqueline CK Lam and Jon Crowcroft
05/08/24 Invited Talk: The growth of parallelism in machine learning inference (Slides) Ursula Franklin Tim Harris (Microsoft)
12/08/24 Llama 3.1 Report Overview (Slides) Ursula Franklin Edwin Brown, Ryan Chan
19/08/24 Overview of Knowledge Graphs (Slides) David Blackwell Navdeep Kaur
28/08/24 Mixture of Experts (Slides) Jack Good Angus R Williams
03/09/24 Invited Talk: Sociotechnical Safety Evaluation of AI systems (Slides) Enigma Laura Weidinger
09/09/24 Invited Talk: On the Brittleness of Prompts in LLMs (Slides) David Blackwell Han Zhou
23/09/24 Mechanistic Interpretability I (Slides) David Blackwell Ryan Chan
02/10/24 Mechanistic Interpretability II (Slides) Delilah Ryan Chan
07/10/24 Invited Talk: Federating Large Language Models from Scratch (Slides) David Blackwell Lorenzo Sani
14/10/24 Invited Talk: Causal Estimation of Memorisation Profiles (Slides) David Blackwell Pietro Lesci
28/10/24 No Language Left Behind (NLLB) Technical Report Overview (Slides) David Blackwell Giulia Occhini, Ryan Chan
04/11/24 Invited Talk: Ethnographic Approaches to AI Evaluations Ursula Franklin Jonas Kgomo
18/11/24 Biological neural networks (Slides, Slides) David Blackwell Balázs Mészáros , Jess Yu
20/11/24 Mechanistic Interpretability III (Slides) Delilah Ryan Chan
25/11/24 Application of foundation models in time series tasks David Blackwell Gholamali Aminian
02/12/24 Can language models play the Wikipedia game? (Slides) David Blackwell Alex Hickey, Jo Knight
02/12/24 Diffusion models Ada Lovelace James Thornton
03/12/24 Mechanistic Interpretability Enigma Neel Nanda
09/12/24 Scaling laws of neural networks (Slides) David Blackwell Edmund Dable-Heath
16/12/24 Improving training with better learning rate and batch size: Linear scaling rule from random matrix theory (Slides) David Blackwell Chanju Park

Material for sessions

20/03/23

Introduction to Word Embeddings and Language modelling

Main

Extra

03/04/23

Deep Learning Basics

Main

Extra

17/04/23

Sequence-to-sequence models part I: RNNs/LSTMs

Main

Extra

03/05/23

Sequence-to-sequence models part II: Encoder-decoder models

Main

Extra

31/05/23

Attention

Main

Extra

26/06/23

Transformer Encoder and Decoders

Main

Extra

10/07/23

BERT: Masked Language modelling and Pre-training

Main

Extra

24/07/23

GPT: Pretraining Decoders

Main

Extra

Note the below materials for other sessions, or are not confirmed

07/08/23

Vision Transformers part I

Main

Extra

21/08/23

Vision Transformers part II

Main

Extra

18/09/23

LoRA (+ parameter efficient fine-tuning)

Main

Extra

02/10/23

Reinforcement Learning Human Feedback (RLHF)

Main

Extra

Beyond RLHF

16/10/23

Prompt Engineering

Guides

Videos

Meta

30/10/23

Knowledge retrieval FMs

Main

Extra

06/11/23

Current challenges and future directions in safety evaluations for generative AI

Main

Extras

13/11/23

Introduction to Diffusion models

There are plenty of blog-posts and top level overviews of diffusion models which explain the main idea of, 'running a noisy blurring process backwards from the noise', however for more technical reading (which I will warn are quite heavy on the maths) the main two papers are:

Both are about the sampling methods used in the process (notably without the inclusion of context that allows for text-to-image generation). For a general overview the following is fairly good:

And if you're curious (and want spoilers) about stable diffusion and latent diffusion models this is the main paper.

20/11/23

Transformers for coding/software engineering

Main

Extra

04/12/23

Guidance for Safe Foundation Model Deployment

Main

11/12/23

Stable Diffusion

Main

08/01/24

Benchmarking AI applications on GPUs

Main

Extra

15/01/24

Retentive Networks

Main

22/01/24

Spatial Graph Patterning of Filamentous Structures

Main

29/01/24

Vision Transformers Need Registers

Main

Extra

05/02/24

Existential Risk of AI?

Main

Extra

12/02/24

Mechanistic interpretability

Main

Supplementary

Extras

19/02/24

Longitudinal NLP

Main

Extra

26/02/24

Machine Translation Quality Estimation

04/03/24

Expanding participatory governance for LLMs: case studies from BigCode, Aya Initiative, and Collective Intelligence Project

Main

11/03/24

Applying Vision Transformers in Neuroscience

Main

18/03/24

Not even a Chinese Room: evaluating LLMs on code simulation

Main

08/04/24

Paper overviews

Main

15/04/24

Natural Logic-based Fact Verification with LLMs

22/04/24

Learn how to learn and distil during learning - Using meta-learning and second order optimisation to prune the model

Main

29/04/24

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Main

13/05/24

Overview of LLM Security

Main

20/05/24

KAN: Kolmogorov-Arnold Networks

Main

04/06/24

Invited Talk -

Abstract It has now been a decade since the first adversarial examples were demonstrated on deep learning models. And yet, even still, we can not robustly classify MNIST images better than LeNet-5 or ImageNet images better than AlexNet. But now, more than ever, we need robust machine learning models. And not only robust to evasion attack: but also robust to poisoning, stealing, and many other attacks. In this talk I survey the current progress we have made on adversarial machine learning. While we have made many significant advances in making attacks practical, we have had made considerably less progress on defences. Making progress towards addressing these challenges will be of the highest importance in the coming years.

01/07/24

A perspective on the fundamentals of transformers

08/07/24

Equally Safe Online? A participatory approach to tackling Gender-Based Violence

We are in the midst of an ‘epidemic of online abuse’, which disproportionately affects women and minoritised groups. In recent years, technology companies and computer science researchers have made efforts to automate the identification of hate speech and other toxic or abusive language. However, existing resources are limited in a number of important ways, such as their lack of theoretical grounding and stakeholder input.The EPSRC funded project Equally Safe Online aims to harness stakeholder expertise to co-design resources and methods to tackle online GBV. In this talk, I will discuss outcomes and ongoing work from the project, focusing on participatory design for NLP, perspectivist approaches to dataset creation, and generation of counterspeech against hateful language.

05/08/24

The growth of parallelism in machine learning inference

When I started working on machine learning inference four years ago a typical model would run on a handful of CPU cores. We needed to think about distributing work between threads, but the systems-level problems and abstractions were well understood. Fast forward to today and machine learning models are so large that even a "small" language model can have billions of parameters and run across a multi-GPU system. In this talk I am going to go on an end-to-end journey through the implementation of these models. We will see some of the different problems which emerge in parallelism and distributed computing, and some of the places where I think we are lacking good abstractions.

12/08/24

Llama 3.1 Report Overview

19/08/24

Overview of Knowledge Graphs

28/08/24

Mixture of Experts

03/09/24

Sociotechnical Safety Evaluation of AI systems

Generative AI systems create risks which must be evaluated in order to be managed or mitigated. Current approaches to AI safety evaluation are primarily focused on assessing technical artefacts in isolation, and so may miss hazards that arise through human-AI-interaction or wide-scale deployment. In this talk, I introduce a sociotechnical approach to AI safety evaluation that aims to capture relevant complexity, to provide a more comprehensive safety assessment. In addition, different evaluation goals require matching evaluation methods. Reviewing the current landscape of AI safety evaluation, I point out strengths and key gaps that need to be addressed. I close by discussing trade-offs and open challenges at the frontier of AI safety evaluation research.

23/09/24

Mechanistic Interpretability I

02/10/24

Mechanistic Interpretability II

07/10/24

Federating Large Language Models from Scratch

Large language models (LLMs) offer unprecedented ML capabilities and continue to improve rapidly. As a result, various organizations are locked in a race to scale LLMs and explore their limits and weaknesses. We believe federated learning (FL) offers an untapped potential to dramatically increase the supply of data sources for these models. Early work has shown, for example, how LLM pre-training can tap into edge device data leveraging FL. Others have shown the impact of using federated optimizers in a poorly connected distributed infrastructure of stateful workers to train a centralized LLM.

We believe FL can reshape LLM practices and opportunities thanks to two of its most exciting features: relaxed synchronization requirements and privacy-by-design on users' data. The federated paradigm opens the doors of new interesting possibilities for the LLM community, like resource sharing, unbounded scaling on private data, democratization, and privacy. This talk contributes to the emerging field that blends the two worlds of FL and LLMs by presenting a fully federated approach for LLM pre-training from scratch. Our approach has shown to be viable at a scale of 3B parameters under a real working system.

14/10/24

Natural Experiments in NLP and Where to Find Them

In training language models, training choices—such as the random seed for data ordering or the token vocabulary size—significantly influence model behaviour. Answering counterfactual questions like "How would the model perform if this instance were excluded from training?" is computationally expensive, as it requires re-training the model. Once these training configurations are set, they become fixed, creating a "natural experiment" where modifying the experimental conditions incurs high computational costs. Using econometric techniques to estimate causal effects from observational studies enables us to analyse the impact of these choices without requiring full experimental control or repeated model training. In this talk, I will present our paper, Causal Estimation of Memorisation Profiles (Best Paper Award at ACL 2024), which introduces a novel method based on the difference-in-differences technique from econometrics to estimate memorisation without requiring model re-training. I will also discuss preliminary results from ongoing work that applies the regression discontinuity design to estimate the causal effect of selecting a specific vocabulary size.

28/10/24

No Language Left Behind (NLLB)

18/11/24

Biological neural networks

Two talks:

  • Event-Based Learning of Synaptic Delays in Spiking Neural Networks
  • Information-theoretic Analysis of Brain Dynamics & Neural Network Models Informed by Information Theory

02/10/24

Mechanistic Interpretability III

02/12/24

Can language models play the Wikipedia game?

This project examines how Language Models can navigate Wikipedia. Which tests their ability to link semantically similar topics in a practical way. We have run experiments with a large variety of sentence embedding and large language models for comparison. We have also seen how the performance varies when transversing Wikipedia in other languages and when navigating between scientific papers, which allows an assessment of the breadth of the model's abilities.

09/12/24

Scaling laws of neural networks

Miscellaneous

Tokenizers and Huggingface tutorial

Main