text2image:
- karlo text2image model
- DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code
- Kandinsky multilingual text2image latent diffusion model
- stable diffusion 1.5
- stable diffusion 2.0
- stable diffusion 2.1
- stable diffusion xl (SDXL) base 0.9 & refinder 0.9
- AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
- PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper
- Latent Consistency Models LoRAs for high quality few step image generation
- OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM
- StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
- AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
- InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin
- PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
- StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
- IDM-VTON Virtual Try-on for clothes and fashion
- ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation
- Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
- Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions
text to 3d:
- OpenAI shap-E a text/image to 3D model
- shap-e local run text-to-3d locally
- stable-dreamfusion A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model
image to 3d:
- Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
- DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
- Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing
image to text (OCR):
- pix2tex LaTeX OCR
other:
- facebookresearch/segment-anything image segmentation
- YOLOv8 SOTA object detection, segmentation, classification and tracking
- DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
- segment-anything-fast A batched offline inference oriented version of segment-anything
- Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
- text-to-room text to room
- DragGAN Interactive Point-based Manipulation on Generative Images, demo
- DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
- HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
- CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
- ZeroNVS Zero shot 460 degree view synthesis from single images
- x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency
- Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
- SUPIR Super Resolution and Image Restoration
- RMBG BRIA Background Removal model. hf demo space
- ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
- ComfyUI-Manager installs missing custom nodes automatically
- SeargeSDXL Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
- Sytan ComfyUI SDXL workflow with txt2img using base and refiner
- Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion
- sd-webui-cloud-inference extension via omniinfer.io
- stable-diffusion-webui-forge platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference
- SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
- Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating
- RuinedFooocus A Fooocus fork
- Fooocus-MRE A Fooocus fork
- stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface
- imaginAIry a Stable Diffusion UI
- InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111
- mlc-ai/web-stable-diffusion
- anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
- refacer One-Click Deepfake Multi-Face Swap Tool
- stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
- FaceFusion Next generation face swapper and enhancer
- OneFlow Backend for diffusers and ComfyUI
- StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
- OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion
- fast-stable-diffusion TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others
- https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
- https://github.com/cloneofsimo/lora
- OneTrainer all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings
- sd-scripts by kohya-ss
- LoRA Easy Training Scripts GUI for Kohya's Scripts
- Kohya_ss Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, experimental sdxl support, reddit thread
- Fine tuning concepts explained visually
- text2image-gui a Stable Diffusion GUI by NMKD
- sd-webui-EasyPhoto / easyphoto plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, paper
- StableTuner Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)
- SimpleTuner fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed
- x-flux LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed
- ai-toolkit Flux LoRA training on local and runpod
- Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
- ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
- cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
- SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
- Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
- LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
- ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models
- VAR GPT beats diffusion
- InstantStyle towards Style-Preserving in Text-to-Image Generation