Skip to content

Latest commit

 

History

History
118 lines (100 loc) · 12.3 KB

image-generation.md

File metadata and controls

118 lines (100 loc) · 12.3 KB

🏠Home

Image Generation

Models

text2image:

  • karlo text2image model
  • DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code
  • Kandinsky multilingual text2image latent diffusion model
  • stable diffusion 1.5
  • stable diffusion 2.0
  • stable diffusion 2.1
  • stable diffusion xl (SDXL) base 0.9 & refinder 0.9
  • AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  • PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper
  • Latent Consistency Models LoRAs for high quality few step image generation
  • OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM
  • StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
  • AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
  • InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin
  • PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
  • StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
  • IDM-VTON Virtual Try-on for clothes and fashion
  • ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation
  • Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
  • Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions

text to 3d:

image to 3d:

  • Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
  • DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
  • Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

image to text (OCR):

other:

  • facebookresearch/segment-anything image segmentation
    • YOLOv8 SOTA object detection, segmentation, classification and tracking
    • DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
    • segment-anything-fast A batched offline inference oriented version of segment-anything
  • Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
  • text-to-room text to room
  • DragGAN Interactive Point-based Manipulation on Generative Images, demo
  • DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
  • HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
  • CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
  • ZeroNVS Zero shot 460 degree view synthesis from single images
  • x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency
  • Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
  • SUPIR Super Resolution and Image Restoration
  • RMBG BRIA Background Removal model. hf demo space

Wrappers & GUIs

  • ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
  • Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion
  • SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
  • Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating
  • stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface
  • imaginAIry a Stable Diffusion UI
  • InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111
  • mlc-ai/web-stable-diffusion
  • anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
  • refacer One-Click Deepfake Multi-Face Swap Tool
  • stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
  • FaceFusion Next generation face swapper and enhancer
  • OneFlow Backend for diffusers and ComfyUI
  • StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
  • OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

Fine Tuning

Research

  • Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
  • ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
  • cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
  • SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
  • Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
  • LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
  • ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models
  • VAR GPT beats diffusion
  • InstantStyle towards Style-Preserving in Text-to-Image Generation