Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin plots should not exist #10

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fargerio
Copy link

Recommendation to completely discourage violin plots, since the data is more readable and actually usefully represented using either a histogram or a box plot. Here's a video explanation of the drawbacks of violin plots: Violin plots should not exist.


This is quite common in the literature as well, but unfortunately, violin plots (or any sort of smoothed distribution curves) make no sense for small n.
Violin plots don't help your reader to understand the data. The whole justification is that regular box plots may misrepresent multimodel data distributions, so you want to show the data. But violin plots don't have units and tick marks that allow people to actually read the data and compare the distributions. Also the selection of the probability density function for the smoothing is almost never explained, and may even be completely misleading with small datasets.
Copy link

@egonelbre egonelbre Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violin plots do have quartiles and medians etc. on them, all the same features as a box plot. See the paper https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/ViolinPlots.pdf

People do remove them, which should be avoided.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can have quartiles and medians. I was referring to the distributions' height tick marks, such as you would find on a histogram. So you can't actually compare the different distributions the violin plot depicts, you'd have to manually cut them out and overlay them, and still would be missing one axis' units.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box plots also do not have that information, so we should avoid them as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants