The fundamental problems for data mining, statistical analysis, and machine learning are:
- whether several distributions are different?
- whether random variables are dependent?
- how to pick out useful variables/features from a high-dimensional data?
These issues can be tackled by Ball statistics, which enjoy following admirable advantages:
- available for most of datasets (e.g., traditional tabular data, brain shape, functional connectome, wind direction and so on)
- insensitive to outliers, distribution-free and model-free;
- theoretically guaranteed and computationally efficient.
Install the Ball package from CRAN:
install.packages("Ball")
Compared with selective R packages available for datasets in metric spaces:
fastmit | energy | HHG | Ball | |
---|---|---|---|---|
Test of equal distributions | ❌ | ✔️ | ✔️ | ✔️ |
Test of independence | ✔️ | ✔️ | ✔️ | ✔️ |
Test of joint independence | ❌ | ❌ | ❌ | ✔️ |
Feature screening / Sure Independence Screening (SIS) | ❌ | ❌ | ❌ | ✔️ |
Iterative Feature screening / Iterative SIS | ❌ | ❌ | ❌ | ✔️ |
Datasets in metric spaces | ✔️ | SNT | ✔️ | ✔️ |
Robustness | ✔️ | ❌ | ✔️ | ✔️ |
Parallel programming | ❌ | ❌ | ✔️ | ✔️ |
Computational efficiency | 🏃🏃🏃 | 🏃🏃🏃 | 🏃🏃 | 🏃🏃🚶 |
SNT is the abbreviation of strong negative type.
See the following documents for more details about the Ball package:
- github page (short)
- vignette (moderate)
- JSS paper (detailed)
Install the Ball package from PyPI:
pip install Ball
If you use Ball or reference our vignettes in a presentation or publication, we would appreciate citations of our package.
Zhu J, Pan W, Zheng W, Wang X (2021). “Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces.” Journal of Statistical Software, 97(6), 1–31. doi: 10.18637/jss.v097.i06.
Here is the corresponding Bibtex entry
@Article{,
title = {{Ball}: An {R} Package for Detecting Distribution Difference and Association in Metric Spaces},
author = {Jin Zhu and Wenliang Pan and Wei Zheng and Xueqin Wang},
journal = {Journal of Statistical Software},
year = {2021},
volume = {97},
number = {6},
pages = {1--31},
doi = {10.18637/jss.v097.i06},
}
- Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping. Ball Divergence: Nonparametric two sample test. Ann. Statist. 46 (2018), no. 3, 1109--1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077
- Wenliang Pan, Xueqin Wang, Weinan Xiao & Hongtu Zhu (2018) A Generic Sure Independence Screening Procedure, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1462709
- Wenliang Pan, Xueqin Wang, Heping Zhang, Hongtu Zhu & Jin Zhu (2019) Ball Covariance: A Generic Measure of Dependence in Banach Space, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1543600
- Jin, Z., Wenliang P., Wei Z., and Xueqin W. (2018). Ball: An R package for detecting distribution difference and association in metric spaces. arXiv preprint arXiv:1811.03750. URL http://arxiv.org/abs/1811.03750.
Open an issue or send an email to Jin Zhu at [email protected]