[FEATURE REQUEST] Images as prompts? #379

lendrick · 2022-08-31T15:48:59Z

lendrick
Aug 31, 2022

Disco Diffusion and DALL-E 2 (I believe) both allow you to specify images as prompts. This is separate and distinct from img2img, which still uses text as a prompt, and more like an image search algorithm that uses CLIP to identify the features of a search image and returns images with similar features. I've been playing around with swapping out CLIPTextModel for CLIPVisionModel and processing a sample image as a prompt, but thus far I haven't had any luck, because I keep running into this:

!!Runtime error (txt2img)!!
 mat1 and mat2 shapes cannot be multiplied (514x1024 and 768x320)

I've replaced FrozenCLIPEmbedder.forward() in ldm/modules/encoders/modules.py with this ugly proof of concept hack:

    def forward(self, text):
        batch_encoding = self.tokenizer(text,
                                        truncation=True,
                                        max_length=self.max_length,
                                        return_length=True,
                                        return_overflowing_tokens=False,
                                        padding="max_length",
                                        return_tensors="pt")
        url = "http://images.cocodataset.org/val2017/000000039769.jpg"
        image = Image.open(requests.get(url, stream=True).raw)
        inputs = self.processor(images=image,
                                return_tensors="pt").to(self.device)
        outputs = self.transformer(**inputs)
        z = outputs.last_hidden_state
        return z

and I changed init to this:

    def __init__(self,
                 version="openai/clip-vit-large-patch14",
                 device="cuda",
                 max_length=77):
        super().__init__()
        self.tokenizer = CLIPTokenizer.from_pretrained(version)
        self.processor = CLIPProcessor.from_pretrained(version)
        self.transformer = CLIPVisionModel.from_pretrained(version)
        self.device = device
        self.max_length = max_length
        self.freeze()

Am I attempting something that's impossible for some reason, or is there a way to do this?

RubenWB · 2022-09-01T08:10:38Z

RubenWB
Sep 1, 2022

There are AI's being developed called Text to prompt that some people on reddit are using to sort of guess what the prompts are of really good looking images people have posted, so definitely possible- Not sure if hlky has the time to implement, but would be pretty awesome if it was used

https://github.com/pharmapsychotic/clip-interrogator
here is the link to the github of the model I mentioned

2 replies

altryne Sep 1, 2022

Do you mean the CLIP interrogator ?

RubenWB Sep 1, 2022

yes 😅 just noticed the wrong link was given

jaytoday · 2022-09-01T21:53:41Z

jaytoday
Sep 1, 2022

This is separate and distinct from img2img, which still uses text as a prompt

I see this is the case but I find it surprising to have both an image input and a text prompt. What would be expected from SD if the text prompt had no relation to the input image?

0 replies

RubenWB · 2022-09-05T16:58:56Z

RubenWB
Sep 5, 2022

https://github.com/justinpinkney/stable-diffusion

This dude was able to create a Create variations from input image that feeds directly into the model as far as I understand that has been released today

Might be worth checking out 😬

1 reply

dmt-hub Sep 7, 2022

can this be implemented in the webui?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Images as prompts? #379

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[FEATURE REQUEST] Images as prompts? #379

lendrick Aug 31, 2022

Replies: 3 comments · 3 replies

RubenWB Sep 1, 2022

altryne Sep 1, 2022

RubenWB Sep 1, 2022

jaytoday Sep 1, 2022

RubenWB Sep 5, 2022

dmt-hub Sep 7, 2022

lendrick
Aug 31, 2022

Replies: 3 comments 3 replies

RubenWB
Sep 1, 2022

jaytoday
Sep 1, 2022

RubenWB
Sep 5, 2022