Image Generation

Models

text2image:

karlo text2image model
DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code
Kandinsky multilingual text2image latent diffusion model
stable diffusion 1.5
stable diffusion 2.0
stable diffusion 2.1
stable diffusion xl (SDXL) base 0.9 & refinder 0.9
AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper
Latent Consistency Models LoRAs for high quality few step image generation
OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM
StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin
PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
IDM-VTON Virtual Try-on for clothes and fashion
ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation
Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions

text to 3d:

OpenAI shap-E a text/image to 3D model
shap-e local run text-to-3d locally
stable-dreamfusion A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model

image to 3d:

Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

image to text (OCR):

pix2tex LaTeX OCR

other:

facebookresearch/segment-anything image segmentation
- YOLOv8 SOTA object detection, segmentation, classification and tracking
- DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
- segment-anything-fast A batched offline inference oriented version of segment-anything
Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
text-to-room text to room
DragGAN Interactive Point-based Manipulation on Generative Images, demo
DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
ZeroNVS Zero shot 460 degree view synthesis from single images
x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency
Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
SUPIR Super Resolution and Image Restoration
RMBG BRIA Background Removal model. hf demo space

Wrappers & GUIs

ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
- ComfyUI-Manager installs missing custom nodes automatically
- SeargeSDXL Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
- Sytan ComfyUI SDXL workflow with txt2img using base and refiner
Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion
- sd-webui-cloud-inference extension via omniinfer.io
- stable-diffusion-webui-forge platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference
SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating
- RuinedFooocus A Fooocus fork
- Fooocus-MRE A Fooocus fork
stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface
imaginAIry a Stable Diffusion UI
InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111
mlc-ai/web-stable-diffusion
anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
refacer One-Click Deepfake Multi-Face Swap Tool
stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
FaceFusion Next generation face swapper and enhancer
OneFlow Backend for diffusers and ComfyUI
StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

Fine Tuning

https://github.com/JoePenna/Dreambooth-Stable-Diffusion
fast-stable-diffusion TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
https://github.com/cloneofsimo/lora
OneTrainer all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings
sd-scripts by kohya-ss
- LoRA Easy Training Scripts GUI for Kohya's Scripts
- Kohya_ss Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, experimental sdxl support, reddit thread
Fine tuning concepts explained visually
text2image-gui a Stable Diffusion GUI by NMKD
sd-webui-EasyPhoto / easyphoto plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, paper
StableTuner Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)
SimpleTuner fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed
x-flux LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed
ai-toolkit Flux LoRA training on local and runpod

Research

Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models
VAR GPT beats diffusion
InstantStyle towards Style-Preserving in Text-to-Image Generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-generation.md

image-generation.md

Image Generation

Models

Wrappers & GUIs

Fine Tuning

Research

Files

image-generation.md

Latest commit

History

image-generation.md

File metadata and controls

Image Generation

Models

Wrappers & GUIs

Fine Tuning

Research