marginalkde ~100x slower than a scatter plot #504

BioTurboNick · 2022-06-24T02:56:13Z

I was hoping to use something like marginalkde to better display dense scatterplot data, but it takes 100x as long as a scatter plot to generate.

The slow part is entirely the pdf call for each x/y.

Barring performance improvements to pdf, is there a way to reduce the resolution so it can be calculated faster?

The text was updated successfully, but these errors were encountered:

BioTurboNick · 2022-06-24T03:25:23Z

x = rand(1234)
y = x + rand(1234)
@btime scatter(x, y)
  245.500 μs (1237 allocations: 90.60 KiB)

Looking at npoints:
32x32 (226.594 ms (201146 allocations: 57.29 MiB))

64x64 (374.709 ms (204848 allocations: 158.35 MiB))

128x128 (828.861 ms (204848 allocations: 534.07 MiB))

256x256 (2.513 s (204848 allocations: 1.93 GiB)) - the default

I think 64x64 seems to strike a good balance of speed and performance.

This could be exposed as a parameter indicating the exponent of the power of two to use: default could be 6, 7 or 8 (current default) would be reasonable for higher quality

BioTurboNick · 2022-06-24T03:33:28Z

Then again, even 128 in real world data has some issues - but it occurs to me this might be due to far outliers in the data leading to lower resolution in the denser parts.

BioTurboNick · 2022-06-25T02:48:41Z

Trimming the upper and lower 1% of points helped. Maybe that can also be a parameter?

BioTurboNick · 2022-06-25T04:39:05Z

Waaait a second - the pdf function is only being used to select the levels. But the contour function can already do that internally. Is there any benefit to this? Quite expensive for just that task.

Contour alone:

Current implementation:

BioTurboNick · 2022-06-25T04:44:55Z

"levels are evenly-spaced in the cumulative probability mass" is what the documentation says. Maybe that's importantly different from what GR does internally. Not sure what the pros and cons would be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marginalkde ~100x slower than a scatter plot #504

marginalkde ~100x slower than a scatter plot #504

BioTurboNick commented Jun 24, 2022

BioTurboNick commented Jun 24, 2022 •

edited

Loading

BioTurboNick commented Jun 24, 2022 •

edited

Loading

BioTurboNick commented Jun 25, 2022

BioTurboNick commented Jun 25, 2022

BioTurboNick commented Jun 25, 2022

marginalkde ~100x slower than a scatter plot #504

marginalkde ~100x slower than a scatter plot #504

Comments

BioTurboNick commented Jun 24, 2022

BioTurboNick commented Jun 24, 2022 • edited Loading

BioTurboNick commented Jun 24, 2022 • edited Loading

BioTurboNick commented Jun 25, 2022

BioTurboNick commented Jun 25, 2022

BioTurboNick commented Jun 25, 2022

BioTurboNick commented Jun 24, 2022 •

edited

Loading

BioTurboNick commented Jun 24, 2022 •

edited

Loading