-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data download is interrupted after a few minutes #195
Comments
I am currently trying the same script again (previously working) and a different error happened this time. Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): During handling of the above exception, another exception occurred:çTraceback (most recent call last): Traceback (most recent call last): |
My recommendation is to use an external tool for downloading for now: #201 (comment) |
sorry, I think my explanation was not clear. I'm trying to download only metadata. |
Is this what you are running (seems okay at my end): >>> instance = SraSearch(2, 1000000, strategy="miRNA-seq")
>>> df = instance.search() 4%|█▍ | 5400/130053 [03:13<1:19:26, 26.15it/s] |
Yep, it starts running but it spits out this error after some minutes... Traceback (most recent call last): I'm guessing something is not formatted properly on SRA side (it happened to me when parsing something else from SRA in python). They include some '\b somewhere in the description fields and python tries to parse this as some kind of binary string.... As a workaround, I'm trying to run the same query on GEO to see if this is parsed differently by them. Thanks for your help! |
You could try with |
thank you, I will try that as last resource. The problem is I'm interested in all SRPs so then I would have to query sample by sample to retrieve since verbosity=1 only gives you experiment accessions. |
Describe the bug
Not sure what's happening but for the last few days, I'm struggling to download data using pysradb. This used to work no problem a couple of weeks ago. Here is the error I get:
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher [6/370]
yield
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 460, in read
return self._read_chunked(amt)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 583, in _read_chunked
chunk_left = self._get_chunk_left()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 566, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 526, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1274, in recv_into
return self.read(nbytes, buffer)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1130, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/eap/miRexpress/updates/code/run_update.py", line 200, in
generate_raw_tsv("miRNA-seq", os.path.join(raw_folder, "miRNA-seq.tsv"))
File "/home/eap/miRexpress/updates/code/run_update.py", line 36, in generate_raw_tsv
instance.search()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 793, in search
self._format_response(r.raw)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 861, in _format_response
for event, elem in Et.iterparse(content):
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/xml/etree/ElementTree.py", line 1255, in iterator
data = source.read(16 * 1024)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 566, in read
with self._error_catcher():
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
It seems like it's getting disconnected after some minutes.
Is there a parameter I can change to make it retry or something similar? Are they blocking my IP? Is this a widespread recent issue?
To Reproduce
This really happen with any attempt now (randomly) after a few minutes. In this example I'm trying to download info about all miRNA-seq samples in SRA:
instance = SraSearch(2, 1000000 strategy="miRNA-seq") print("Downloading samples for " + library_type) instance.search()
Thanks a lot for writing this software and the support!!
The text was updated successfully, but these errors were encountered: