Support ellipsis indexing #1696

nc1m · 2024-09-25T12:04:00Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of anndata.
(optional) I have confirmed this bug exists on the master branch of anndata.

Report

In 1.5.1 sklearn changed their _array_indexing method to use ellipsis indexing/slicing:
https://github.com/scikit-learn/scikit-learn/blob/1.5.1/sklearn/utils/_indexing.py#L33

Running train_test_split on AnnData now results in an IndexError.

Code:

import anndata
import session_info
from sklearn.model_selection import train_test_split
import scanpy as sc

session_info.show(html=False, dependencies=True)

adata = sc.datasets.ebi_expression_atlas("E-MTAB-4888")
adata_val, adata_test = train_test_split(adata, test_size=0.5, stratify=adata.obs['Sample Characteristic[cell type]'])

Traceback:

Traceback (most recent call last):
  File "/home/user/project/src/data/test.py", line 6, in <module>
    adata_val, adata_test = train_test_split(adata, test_size=0.5, stratify=adata.obs['Sample Characteristic[cell type]'])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/model_selection/_split.py", line 2805, in train_test_split
    return list(
           ^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/model_selection/_split.py", line 2807, in <genexpr>
    (_safe_indexing(a, train), _safe_indexing(a, test)) for a in arrays
     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_indexing.py", line 267, in _safe_indexing
    return _array_indexing(X, indices, indices_dtype, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/sklearn/utils/_indexing.py", line 33, in _array_indexing
    return array[key, ...] if axis == 0 else array[:, key]
           ~~~~~^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1021, in __getitem__
    oidx, vidx = self._normalize_indices(index)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1002, in _normalize_indices
    return _normalize_indices(index, self.obs_names, self.var_names)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/index.py", line 39, in _normalize_indices
    ax1 = _normalize_index(ax1, names1)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.conda/envs/env/lib/python3.12/site-packages/anndata/_core/index.py", line 109, in _normalize_index
    raise IndexError(f"Unknown indexer {indexer!r} of type {type(indexer)}")
IndexError: Unknown indexer Ellipsis of type <class 'ellipsis'>

I'm not sure if this behavior is intended.

Versions

-----
anndata             0.10.9
scanpy              1.10.2
session_info        1.0.0
sklearn             1.5.1
-----
PIL                 10.3.0
cffi                1.17.1
colorama            0.4.6
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0
h5py                3.11.0
igraph              0.11.6
joblib              1.4.2
kiwisolver          1.4.7
legacy_api_wrap     NA
leidenalg           0.10.2
llvmlite            0.43.0
matplotlib          3.9.2
mpl_toolkits        NA
natsort             8.4.0
numba               0.60.0
numpy               1.26.4
packaging           24.1
pandas              2.2.2
pyparsing           3.1.4
pytz                2024.1
scipy               1.14.1
six                 1.16.0
texttable           1.7.0
threadpoolctl       3.5.0
torch               2.4.1
torchgen            NA
tqdm                4.66.5
typing_extensions   NA
yaml                6.0.2
-----
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Linux-6.1.0-25-amd64-x86_64-with-glibc2.36

ilan-gold · 2024-10-01T09:06:09Z

@nc1m I think this is then a feature because I don't think we ever supported Ellipsis indexing. But this should be simple enough.

flying-sheep · 2024-10-08T07:08:17Z

Ehrdata could benefit from allowing 3D data in .X too: theislab/ehrdata#24

Seems like we already support 3D data in .layers, so latest in the .X-.layers unification (#244) we’d need to do this.

nc1m added Bug 🐛 Triage 🩺 labels Sep 25, 2024

ilan-gold removed the Triage 🩺 label Oct 1, 2024

ilan-gold added enhancement and removed Bug 🐛 labels Oct 1, 2024

ilan-gold added this to the 0.10.10 milestone Oct 1, 2024

ilan-gold changed the title ~~Updating sklearn from 1.1.2 to 1.5.1 breaks train_test_split for AnnData because of ellipsis indexing~~ Support ellipsis indexing Oct 1, 2024

flying-sheep mentioned this issue Oct 10, 2024

Master issue: store time series data theislab/ehrdata#24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ellipsis indexing #1696

Support ellipsis indexing #1696

nc1m commented Sep 25, 2024

ilan-gold commented Oct 1, 2024

flying-sheep commented Oct 8, 2024 •

edited

Loading

Support ellipsis indexing #1696

Support ellipsis indexing #1696

Comments

nc1m commented Sep 25, 2024

Please make sure these conditions are met

Report

Versions

ilan-gold commented Oct 1, 2024

flying-sheep commented Oct 8, 2024 • edited Loading

flying-sheep commented Oct 8, 2024 •

edited

Loading