Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marginalkde ~100x slower than a scatter plot #504

Open
BioTurboNick opened this issue Jun 24, 2022 · 5 comments
Open

marginalkde ~100x slower than a scatter plot #504

BioTurboNick opened this issue Jun 24, 2022 · 5 comments

Comments

@BioTurboNick
Copy link
Member

I was hoping to use something like marginalkde to better display dense scatterplot data, but it takes 100x as long as a scatter plot to generate.

The slow part is entirely the pdf call for each x/y.

Barring performance improvements to pdf, is there a way to reduce the resolution so it can be calculated faster?

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Jun 24, 2022

x = rand(1234)
y = x + rand(1234)
@btime scatter(x, y)
  245.500 μs (1237 allocations: 90.60 KiB)

image

Looking at npoints:
32x32 (226.594 ms (201146 allocations: 57.29 MiB))
image
64x64 (374.709 ms (204848 allocations: 158.35 MiB))
image
128x128 (828.861 ms (204848 allocations: 534.07 MiB))
image
256x256 (2.513 s (204848 allocations: 1.93 GiB)) - the default
image

I think 64x64 seems to strike a good balance of speed and performance.

This could be exposed as a parameter indicating the exponent of the power of two to use: default could be 6, 7 or 8 (current default) would be reasonable for higher quality

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Jun 24, 2022

Then again, even 128 in real world data has some issues - but it occurs to me this might be due to far outliers in the data leading to lower resolution in the denser parts.
image

@BioTurboNick
Copy link
Member Author

Trimming the upper and lower 1% of points helped. Maybe that can also be a parameter?

@BioTurboNick
Copy link
Member Author

Waaait a second - the pdf function is only being used to select the levels. But the contour function can already do that internally. Is there any benefit to this? Quite expensive for just that task.

Contour alone:
image

Current implementation:
image

@BioTurboNick
Copy link
Member Author

"levels are evenly-spaced in the cumulative probability mass" is what the documentation says. Maybe that's importantly different from what GR does internally. Not sure what the pros and cons would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant