Skip to content
Change the repository type filter

All

    Repositories list

    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.5k000Updated Nov 6, 2024Nov 6, 2024
    • Python
      Apache License 2.0
      7000Updated Oct 16, 2024Oct 16, 2024
    • OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
      Python
      1900Updated Sep 27, 2024Sep 27, 2024
    • owlite

      Public
      OwLite is a low-code AI model compression toolkit for AI models.
      Python
      GNU Affero General Public License v3.0
      33800Updated Sep 27, 2024Sep 27, 2024
    • C++
      Apache License 2.0
      1001Updated Jul 23, 2024Jul 23, 2024
    • .github

      Public
      0000Updated Jul 22, 2024Jul 22, 2024
    • Python
      Apache License 2.0
      0100Updated Mar 13, 2024Mar 13, 2024
    • QUICK

      Public
      QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
      Python
      MIT License
      511250Updated Mar 6, 2024Mar 6, 2024