-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize recipe (+ test new rclone copy stage) #1
base: main
Are you sure you want to change the base?
Conversation
Oh I did not set the |
Ok nice, this is working locally. Running a test on dataflow now. |
Ok that works out fine. @ks905383 can you check out the test dataset? You can run this on the leap hub import xarray as xr
xr.open_dataset("gs://leap-scratch/data-library/feedstocks/chirps_feedstock/chirps-global-daily.zarr", engine='zarr', chunks={}) |
Nice, seems to work for 1981-1982 (and I see you've staged the recipe for the whole timeframe) |
@ks905383 awesome. I would love to see what sort of processing you apply to the data, so we can prototype that. If you think that should rather be applied afterwards we can get this finalized. |
I have rebased this on leap-stc/leap-data-management-utils#60 and will see if the copy stage works (moving it to
|
Ayyyy this worked apparently. Need to run for dinner, but will test output tomorrow! Exciting. |
Was able to confirm that the transfer works. Lets change the target bucket/creds and then ingest the entire recipe over at leap-stc/leap-data-management-utils#60 |
There is still a bit of a smell here since I have to define the target bucket+path in the recipe and the creds are hardcoded here. @norlandrhagen do you have opinions on how to reconcile this? |
This should be all set, but it seems the original server is down at the moment. cc @ks905383 |
For this bit? | CopyRclone(target=catalog_store_urls["chirps-global-daily"].replace("https://nyu1.osn.mghpcc.org/","")) #FIXME` Maybe we can separate the prefix from the bucket/path with some |
@norlandrhagen 👍 i think we could possibly do this as a clean up sprint where we reduce the amount of manuall entered (and highly interdependent) naming entries for the user? Tracking this in leap-stc/LEAP_template_feedstock#61 |
What the hell is this error?
|
I guess we were hammering the storage at a rate that is not allowed? See https://cloud.google.com/storage/docs/gcs429 I thought this was the cloud! Hahaha. |
This is pretty odd. We also specify max-workers as 50 and it seemed to have scaled beyond that? |
Ughhh it might be similar to this where each write tries to create an empty 'directory'? But that seems counterintuitive to my understanding of object storage... |
And I guess max-workers is useless when using dataflow prime. |
I also hit a max quota error the other week, so there may be some google console settings to dial. |
ughhhh its scaling past the limit I set again... is this a bug in runner? |
Ill have to look into this more next week. Will have to turn to some other things now. |
This seems like a pretty simple recipe, but running it locally on the hub with
Blows out the 128GB of memory! The files are extremely compressed, but should still fit into memory (1GB on disk, ~20GB in memory x2 files). This smells like the same issue we (@norlandrhagen) had with fsspec balloning the memory for older recipes and virtualizarr generation? What is going on here...