[FEATURE REQUEST] Images as prompts? #379
Replies: 3 comments 3 replies
-
There are AI's being developed called Text to prompt that some people on reddit are using to sort of guess what the prompts are of really good looking images people have posted, so definitely possible- Not sure if hlky has the time to implement, but would be pretty awesome if it was used https://github.com/pharmapsychotic/clip-interrogator |
Beta Was this translation helpful? Give feedback.
-
I see this is the case but I find it surprising to have both an image input and a text prompt. What would be expected from SD if the text prompt had no relation to the input image? |
Beta Was this translation helpful? Give feedback.
-
https://github.com/justinpinkney/stable-diffusion This dude was able to create a Create variations from input image that feeds directly into the model as far as I understand that has been released today Might be worth checking out 😬 |
Beta Was this translation helpful? Give feedback.
-
Disco Diffusion and DALL-E 2 (I believe) both allow you to specify images as prompts. This is separate and distinct from img2img, which still uses text as a prompt, and more like an image search algorithm that uses CLIP to identify the features of a search image and returns images with similar features. I've been playing around with swapping out CLIPTextModel for CLIPVisionModel and processing a sample image as a prompt, but thus far I haven't had any luck, because I keep running into this:
I've replaced FrozenCLIPEmbedder.forward() in ldm/modules/encoders/modules.py with this ugly proof of concept hack:
and I changed init to this:
Am I attempting something that's impossible for some reason, or is there a way to do this?
Beta Was this translation helpful? Give feedback.
All reactions