-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
divide by zero encountered to calculate p-value when r==1 #453
Comments
Try it with a dataset like:
|
This doesn't seem critical, since the user does get the warning. But addressing it would be straightforward, and should probably be handled at the same time as some other correlation edge-cases (e.g., when r ~0 in #435). Notably, the zero division only occurs with the percentage bend correlation method ( import pandas as pd
import pingouin as pg
df = pd.DataFrame(
columns=["0", "X", "Y", "3"],
data=[
[ 0.281, -0.281, 0.281, 0.281],
[-0.171, 0.171, -0.171, -0.171],
[-0.517, 0.517, -0.517, -0.517],
[ 0.667, -0.667, 0.667, 0.667],
[ 0.835, -0.835, 0.835, 0.835],
[ 0.887, -0.887, 0.887, 0.887],
[-0.851, 0.851, -0.851, -0.851],
[-0.315, 0.315, -0.315, -0.315],
[-0.343, 0.343, -0.343, -0.343],
[-0.017, 0.017, -0.017, -0.017],
[-0.543, 0.543, -0.543, -0.543],
[-0.749, 0.749, -0.749, -0.749],
[ 0.643, -0.643, 0.643, 0.643],
[-0.371, 0.371, -0.371, -0.371],
[-0.443, 0.443, -0.443, -0.443],
[ 0.949, -0.949, 0.949, 0.949],
[-0.027, 0.027, -0.027, -0.027],
[-0.489, 0.489, -0.489, -0.489],
[ 0.845, -0.845, 0.845, 0.845],
[ 0.999, -0.999, 0.999, 0.999]
],
)
x = df["X"].to_numpy()
y = df["Y"].to_numpy()
correlation_methods = [
"pearson",
"spearman",
"kendall",
"bicor",
"percbend",
"shepherd",
"skipped",
]
print("Pingouin version:", pg.__version__)
print("=" * 40)
for method in correlation_methods:
print("Correlation method:", method)
correlation = pg.corr(x, y, method=method)
rval = correlation.at[method, "r"]
print("r value:", rval)
print("-" * 40) Running the above code gives me:
|
Same thing with pcorr():
And there is no percbend-ing: The problem with the warning is that the output is undefined. Should the value be 0, 1 or -1? |
In correlation.py the function
_correl_pvalue(r, nx, k=0)
does not handle the case when r=1.0.This results in:
The text was updated successfully, but these errors were encountered: