Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Exponential outer-image shrink? #86

Open
Lex-DRL opened this issue Jun 3, 2024 · 8 comments
Open

[Feature request] Exponential outer-image shrink? #86

Lex-DRL opened this issue Jun 3, 2024 · 8 comments

Comments

@Lex-DRL
Copy link

Lex-DRL commented Jun 3, 2024

I originally posted the idea here: Stability-AI/stablediffusion#378
But I guess, even without any training, this approach can be used within USDU.

The idea

At each moment, USDU works with only a part of an image. Which, naturally, loses some context. The higher resolution we work on, the more it's lost. Currently, the workaround is adding an extra border with nearby parts of the image, but it clearly has some limitations. What if we do an extra step: in addition to adding an outside border, we also add the entire rest of the image, exponentially squashed to fit into much smaller width (a second, "distorted", border)?

A picture is worth a thousand words, so let's look at the example.

Say, we have this image and we're currently working on the following segment:
_image
Zones

Blue is USDU's extra border added for some context.
And red is everything discarded, so SD has no idea of what's there and the only thing it sees is this:
RawCropZones
From this part alone, SD has no idea of huge mountains on the background and sunrise behind them. This could easily be just a rainy day.

Now, what if instead of discarding the red part altogether, we'd squash it and add it as a second "contextual border" around the first one?
In this border, the first row of pixels is 1:1, but the farther away we go, the bigger is an area of source image covered by each pixel (in a geometric progression), losing some details but still giving us a crude approximation of the context. This nonlinear distortion ends up in a lens-like look:
DistortedCropZones DistortedCrop

Now, SD can see that there is a sunrise and mountains, taking those into account for the core part of the piece. Yeah, boundary is distorted, but I guess we can reduce artifacts by dynamically modifying conditioning in the red area, adding "lens distortion" there (and only there).
In a nutshell, this approach follows the same algorithm which is used in realtime graphics for mip-mapping and, currently, in foveated rendering in VR: the farther away we go, the more we average, but still keep some approximation of a "big picture", literally. By it's very nature, this distortion is exponential, so we basically don't care how big the full image is. It could actually be millions of pixels wide and we'd still get just enough info on the surroundings in our single piece.

Maybe, it's worth implementing in USDU? This whole idea might be a dud, but my assumption is, this approach would significantly improve context awareness and therefore final results. SD already has some understanding of image distortion deep inside it (it successfully renders refractions, after all), so it should benefit from seeing the rest of the image, even if it's distorted. Shouldn't it?

@ssitu
Copy link
Owner

ssitu commented Jun 25, 2024

Interesting idea, it's worth a shot to implement and try it out. I'll give it a shot when I can, unless you or someone else wants to. I'm not sure how easy it is to add with the way I structured this repo.

@Lex-DRL
Copy link
Author

Lex-DRL commented Jun 25, 2024

Unfortunately, I have zero experience in coding ComfyUI nodes and with ML programming in general, so I'm in no help here.

What I can help with is how to find squash coefficients.

The outer border is a sum of geometric progression, where:

  • Each term (a1, a2, a3...) represents a row of pixels in the border. Or, more specifically, the area (width) of original image covered by this row.
  • 1st row aka 1st term is 1.
  • Number of terms is the width of this border.
  • Sum of this progression is total width (height) of the part of the image we're trying to squash.
  • q aka r aka progression ratio is unknown, but we know it's between 1 and 2.
  • if the image being squashed is smaller than x1.25 of our border width, then I believe it's better not doing any distortion there at all.

AFAIK, there's no analytic way to detect progression coefficient, given the number of terms, 1st term and their sum. For the previews above, I detected q via bisection method, comparing the result of progression sum to the actual value of squashed width/height we aim to, until I got a q valiue resulting in a less then 1px deviation with this func:

def squashed_width(q, n=64, first_pixel_width=1.0):
	return first_pixel_width * (q**n - 1) / (q - 1)

Then, knowing this coefficient for all 4 sides of the border, I've built distorted UVs and sampled the above images. But I did it in a compositing software, so distortion (sampling) itself was already implemented by a special node there.

@ssitu
Copy link
Owner

ssitu commented Jun 26, 2024

Thanks for the info, I'll probably have questions when I get to adding it. I also don't know if the progression coefficient can be solved for. It seems like any root-finding method can be used to compute it though.

@Lex-DRL
Copy link
Author

Lex-DRL commented Aug 1, 2024

@ssitu
Sooo... by any chance, did you have any opportunity to try approaching it?

@ssitu
Copy link
Owner

ssitu commented Oct 3, 2024

Sorry, haven't gotten a chance to. No promises for when I'll work on this, maybe someone else might want to give it a shot.

@Lex-DRL
Copy link
Author

Lex-DRL commented Oct 4, 2024

I'm well-familiar with python and image processing (shader programming even), but I'm not at all familiar with JS, writing client/server apps in python, any other WebDev-related stuff or ComfyUI's custom nodes API / best practices.
I might try implementing such "exponential cropper" node myself, but I don't even know where to start. Could you suggest a introductory tutorial into custom nodes, explaining ComfyUI's inner workings to someone who came from a different field of programming?

@ssitu
Copy link
Owner

ssitu commented Oct 4, 2024

I first started off by looking at the example in the comfyui repo: https://github.com/comfyanonymous/ComfyUI/blob/master/custom_nodes/example_node.py.example
If ".example" is removed, the node should get automatically imported by comfyui when the server starts.

There shouldn't be a need to do anything with JS or anything frontend related, unless you want to make a nice UI for cropping, which I remember some nodes doing so if you get that far you can try to see how those work. I would just try making a node that just takes some numbers and an image and then returns a transformation of the image.

The images in comfyui are represented by pytorch tensors, which are pretty much numpy arrays and are easily converted into each other. From there, you should be able to use any sort of python image processing with the numpy representations or even convert to PIL and other image library representations that may exist. Then a simple conversion back to a tensor for the return should work.

There are probably tutorials out there by now, but I've never looked at any. I pretty much just looked at what others were doing in their custom node code. Building off of the example node or breaking down an existing custom node should get you started though.

Another approach is to make the exponential shrink function by itself, making sure it works on local images, and then hooking it up to the API, which should avoid the hassle of testing through the frontend. Then all there is to do is to paste/import it into the example node, convert the input from a tensor, then convert the output of the function back to a tensor.

If you run into any trouble with the comfyui custom node API part, you can ask in the comfyui matrix space or just let me know.

@Lex-DRL
Copy link
Author

Lex-DRL commented Oct 4, 2024

Thanks a lot! I'll look into it the next time I get some spare time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants