You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 11, 2024. It is now read-only.
we aim to generalize APIs and optimization techniques around different architecture computers, that is, we also have to make support for GPU removing CPU dependencies because cl-waffe2 was originally designed as so (easy to extend, easy to fuse multiple kernels, cl-waffe2 is nothing but tensor abstraction APIs and more including the fastest auto diff in Common Lisp)
As of now, I'm working on implementing a deep learning compiler for multiple targets including AVX, Neon, NVIDIA, AMD, and more! (it also extends eazy to extend concepts)
The approaches are similar to tibygrad, even a beautiful tinygrad port to Common Lisp may be good.
This might be some kind of destructive changes and included in my future works(thats why i have created a new issue); but I believe this modification will enable get Int8 Quantized Llama3 model running on Common Lisp, with the smallest dependencies. This could be one of the reason using Common Lisp because it is impossible to reproduce it for Python, or other languages communities.
Workload to implement LLAMA3
(nearly) complete tinygrad port to Common Lisp
Fast Conv2D kernel implementation (and winograd)
Support more fuse patterns
GPU(all of NVIDIA, METAL, and AMD is not as difficult with our approaches) Supports
Improve data type interface, esp, cast ops, and quantization op support with JIT.
The text was updated successfully, but these errors were encountered:
we aim to generalize APIs and optimization techniques around different architecture computers, that is, we also have to make support for GPU removing CPU dependencies because cl-waffe2 was originally designed as so (easy to extend, easy to fuse multiple kernels, cl-waffe2 is nothing but tensor abstraction APIs and more including the fastest auto diff in Common Lisp)
As of now, I'm working on implementing a deep learning compiler for multiple targets including AVX, Neon, NVIDIA, AMD, and more! (it also extends eazy to extend concepts)
https://github.com/hikettei/AbstractTensor.lisp
The approaches are similar to tibygrad, even a beautiful tinygrad port to Common Lisp may be good.
This might be some kind of destructive changes and included in my future works(thats why i have created a new issue); but I believe this modification will enable get Int8 Quantized Llama3 model running on Common Lisp, with the smallest dependencies. This could be one of the reason using Common Lisp because it is impossible to reproduce it for Python, or other languages communities.
Workload to implement LLAMA3
The text was updated successfully, but these errors were encountered: