-
Notifications
You must be signed in to change notification settings - Fork 5
AdaptiveBramich takes an excessive amount of time #14
Comments
It seems like the vast majority of time in Adaptive Bramich is spent in
|
I'm so sorry that this notification went right into spam folder in my email so I just read this. I'll take a look at it. |
No worries! I have already implemented I am keeping everything compatible with a CPU only compilation. If this feature is one that you think should be merged into the mainline, let me know how you would prefer optional CUDA usage to be integrated --- an external Makefile which builds a shared object used by |
I have a solution for this implemented at https://github.com/varun-iyer/ois. It is already a significant improvement over CPU-only calculations, and I am still making optimizations in the kernel code. Note the results below were not rigorously produced; they were made by choosing some random science images and cutting them to a particular size before running them through Adaptive Bramich with poly_degree=2. CUDA enabled calculations are in blue and standard calcs are in orange. However, I have some concerns about portability. I am using some features of CUDA which rely on having a decent GPU architecture ( If you think that these features would be important to incorporate, I can invest some time in making it a little more flexible, and see if I can get it to a pull-requestable place. |
Hi, That looks great, thanks for taking the time. I'm interested in incorporating your work. Do you think you can make a pull request to merge with the One of my plans was to make (aside from the python module) a stand alone C program as well and compile it with the |
I haven’t made a PR because I since discovered some issues with the code, and haven’t been able to resolve them quite yet. On larger images (including our science images which are ~3000x2000), the GPU memory overflows and the program crashes. Science images tend to be pretty big, so I need to develop a way to process chunks of the image at a time depending on the available GPU memory. Once I’ve figured out how to do this and tested it out on some science images, I will do that PR. |
Maybe using the option
This would effectively process 1000x1000 pix sub-images each time. |
Thanks for pointing this out! I think that this would be an acceptable workaround for our group’s applications, and some kind of gridding seems to be recommended in section 2.4 of the Bramich paper. Do you think I should attempt a programmatic solution to the original issue, or assume that applications with large images would grid them anyway? I haven’t thoroughly investigated it, but I believe images well above 1000x1000 work on our GTX1070 with 8GB of GPU card memory. |
Even for same complexity as Bramich
The text was updated successfully, but these errors were encountered: