-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Exponential outer-image shrink? #86
Comments
Interesting idea, it's worth a shot to implement and try it out. I'll give it a shot when I can, unless you or someone else wants to. I'm not sure how easy it is to add with the way I structured this repo. |
Unfortunately, I have zero experience in coding ComfyUI nodes and with ML programming in general, so I'm in no help here. What I can help with is how to find squash coefficients. The outer border is a sum of geometric progression, where:
AFAIK, there's no analytic way to detect progression coefficient, given the number of terms, 1st term and their sum. For the previews above, I detected def squashed_width(q, n=64, first_pixel_width=1.0):
return first_pixel_width * (q**n - 1) / (q - 1) Then, knowing this coefficient for all 4 sides of the border, I've built distorted UVs and sampled the above images. But I did it in a compositing software, so distortion (sampling) itself was already implemented by a special node there. |
Thanks for the info, I'll probably have questions when I get to adding it. I also don't know if the progression coefficient can be solved for. It seems like any root-finding method can be used to compute it though. |
@ssitu |
Sorry, haven't gotten a chance to. No promises for when I'll work on this, maybe someone else might want to give it a shot. |
I'm well-familiar with python and image processing (shader programming even), but I'm not at all familiar with JS, writing client/server apps in python, any other WebDev-related stuff or ComfyUI's custom nodes API / best practices. |
I first started off by looking at the example in the comfyui repo: https://github.com/comfyanonymous/ComfyUI/blob/master/custom_nodes/example_node.py.example There shouldn't be a need to do anything with JS or anything frontend related, unless you want to make a nice UI for cropping, which I remember some nodes doing so if you get that far you can try to see how those work. I would just try making a node that just takes some numbers and an image and then returns a transformation of the image. The images in comfyui are represented by pytorch tensors, which are pretty much numpy arrays and are easily converted into each other. From there, you should be able to use any sort of python image processing with the numpy representations or even convert to PIL and other image library representations that may exist. Then a simple conversion back to a tensor for the return should work. There are probably tutorials out there by now, but I've never looked at any. I pretty much just looked at what others were doing in their custom node code. Building off of the example node or breaking down an existing custom node should get you started though. Another approach is to make the exponential shrink function by itself, making sure it works on local images, and then hooking it up to the API, which should avoid the hassle of testing through the frontend. Then all there is to do is to paste/import it into the example node, convert the input from a tensor, then convert the output of the function back to a tensor. If you run into any trouble with the comfyui custom node API part, you can ask in the comfyui matrix space or just let me know. |
Thanks a lot! I'll look into it the next time I get some spare time. |
I originally posted the idea here: Stability-AI/stablediffusion#378
But I guess, even without any training, this approach can be used within USDU.
The idea
At each moment, USDU works with only a part of an image. Which, naturally, loses some context. The higher resolution we work on, the more it's lost. Currently, the workaround is adding an extra border with nearby parts of the image, but it clearly has some limitations. What if we do an extra step: in addition to adding an outside border, we also add the entire rest of the image, exponentially squashed to fit into much smaller width (a second, "distorted", border)?
A picture is worth a thousand words, so let's look at the example.
Say, we have this image and we're currently working on the following segment:
Blue is USDU's extra border added for some context.
And red is everything discarded, so SD has no idea of what's there and the only thing it sees is this:
From this part alone, SD has no idea of huge mountains on the background and sunrise behind them. This could easily be just a rainy day.
Now, what if instead of discarding the red part altogether, we'd squash it and add it as a second "contextual border" around the first one?
In this border, the first row of pixels is 1:1, but the farther away we go, the bigger is an area of source image covered by each pixel (in a geometric progression), losing some details but still giving us a crude approximation of the context. This nonlinear distortion ends up in a lens-like look:
Now, SD can see that there is a sunrise and mountains, taking those into account for the core part of the piece. Yeah, boundary is distorted, but I guess we can reduce artifacts by dynamically modifying conditioning in the red area, adding "lens distortion" there (and only there).
In a nutshell, this approach follows the same algorithm which is used in realtime graphics for mip-mapping and, currently, in foveated rendering in VR: the farther away we go, the more we average, but still keep some approximation of a "big picture", literally. By it's very nature, this distortion is exponential, so we basically don't care how big the full image is. It could actually be millions of pixels wide and we'd still get just enough info on the surroundings in our single piece.
Maybe, it's worth implementing in USDU? This whole idea might be a dud, but my assumption is, this approach would significantly improve context awareness and therefore final results. SD already has some understanding of image distortion deep inside it (it successfully renders refractions, after all), so it should benefit from seeing the rest of the image, even if it's distorted. Shouldn't it?
The text was updated successfully, but these errors were encountered: