Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] All arrays must be of the same length #215

Open
Rohit-Satyam opened this issue Apr 7, 2024 · 1 comment
Open

[BUG] All arrays must be of the same length #215

Rohit-Satyam opened this issue Apr 7, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Rohit-Satyam
Copy link

Describe the bug
Unable to run pysradb for GSE198257
To Reproduce
Steps to reproduce the behavior:

## Installation: pip install git+https://github.com/saketkc/pysradb
pysradb gse-to-srp  GSE198257

Traceback (most recent call last):
  File "/home/subudhak/miniconda3/bin/pysradb", line 8, in <module>
    sys.exit(parse_args())
             ^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/cli.py", line 1206, in parse_args
    gse_to_srp(args.gse_ids, args.saveto, args.detailed, args.desc, args.expand)
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/cli.py", line 232, in gse_to_srp
    df = sradb.gse_to_srp(
         ^^^^^^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pysradb/sraweb.py", line 799, in gse_to_srp
    new_gse_df = pd.DataFrame(
                 ^^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 767, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/subudhak/miniconda3/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

Desktop (please complete the following information):

  • OS: [ Ubuntu 20.04]
  • Python version [Python 3.11.8]
@Rohit-Satyam Rohit-Satyam added the bug Something isn't working label Apr 7, 2024
@nick-youngblut
Copy link

nick-youngblut commented Nov 5, 2024

I'm getting the same error for GSE279289. The sradb.gse_to_srp code assumes that all accessions return a dataframe, but some return None, which caused the error:

def fetch_gds_results(self, gse, **kwargs):
        result = self.get_esummary_response("geo", gse)

        try:
            uids = result["uids"]
        except KeyError:
            print("No results found for {} | Obtained result: {}".format(gse, result))
            return None
        gse_records = []
        for uid in uids:
            record = result[uid]
            del record["uid"]
            if record["extrelations"]:
                extrelations = record["extrelations"]
                for extrelation in extrelations:
                    keys = list(extrelation.keys())
                    values = list(extrelation.values())
                    assert sorted(keys) == sorted(
                        ["relationtype", "targetobject", "targetftplink"]
                    )
                    assert len(values) == 3
                    record[extrelation["relationtype"]] = extrelation["targetobject"]
                del record["extrelations"]
                gse_records.append(record)
        if not len(gse_records):
            print("No results found for {}".format(gse))
            return None
        return pd.DataFrame(gse_records)

The correct type hint for the return is -> Optional[pd.DataFrame].

However, the possible None return is not accounted for:

    def gse_to_srp(self, gse, **kwargs):
        if isinstance(gse, str):
            gse = [gse]
        gse_df = self.fetch_gds_results(gse, **kwargs)
        gse_df = gse_df.rename(
            columns={"accession": "study_alias", "SRA": "study_accession"}
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants