Integrate TransformerEngine #1098

Quentin-Anthony · 2023-12-21T01:53:40Z

Needed for fp8 training, and adds some nice fp16/bf16 optimizations for Ampere and newer architectures that we can make use of regardless.

https://github.com/EleutherAI/TransformerEngine

Quentin-Anthony · 2023-12-22T00:26:57Z

Fairly mature implementation at https://github.com/NVIDIA/Megatron-LM

mkerin · 2023-12-22T17:36:33Z

As discussed on Discord - if you need some extra dev manpower then I'll happily take this one off your hands

StellaAthena · 2023-12-22T18:11:00Z

As discussed on Discord - if you need some extra dev manpower then I'll happily take this one off your hands

Thank you!

tf-nv · 2024-03-04T16:42:20Z

Hi, I am curious about the state of the efforts and don't see a related branch. I read on discord that FP8 was working but there were struggles with convergence.

@Quentin-Anthony IIUC you spent some time on this as well, could you tell me more? :)

mkerin · 2024-03-07T11:47:02Z

Hi - just commenting to say that I’m afraid that I got distracted by other projects, and didn’t make any significant progress on this. Removing myself as assignee as agreed with Quentin so that I don’t block anyone else from picking it up, as I should have done much sooner.

tf-nv · 2024-03-13T12:22:14Z

There are a few things to unpack here. I had a look at the difference between GPT-NeoX megatron and the upstream megatron, which has a mature implementation as @Quentin-Anthony said. Here's a draft PR with some thoughts on the diff: #1185

It includes a few thoughts on the matter, let's discuss there :)

Quentin-Anthony added the feature request New feature or request label Dec 21, 2023

Quentin-Anthony assigned mkerin Dec 22, 2023

mkerin removed their assignment Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate TransformerEngine #1098

Integrate TransformerEngine #1098

Quentin-Anthony commented Dec 21, 2023

Quentin-Anthony commented Dec 22, 2023

mkerin commented Dec 22, 2023

StellaAthena commented Dec 22, 2023

tf-nv commented Mar 4, 2024

mkerin commented Mar 7, 2024

tf-nv commented Mar 13, 2024

Integrate TransformerEngine #1098

Integrate TransformerEngine #1098

Comments

Quentin-Anthony commented Dec 21, 2023

Quentin-Anthony commented Dec 22, 2023

mkerin commented Dec 22, 2023

StellaAthena commented Dec 22, 2023

tf-nv commented Mar 4, 2024

mkerin commented Mar 7, 2024

tf-nv commented Mar 13, 2024