Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S1 image download function #3

Open
ZachHoppinen opened this issue Jan 27, 2023 · 18 comments
Open

S1 image download function #3

ZachHoppinen opened this issue Jan 27, 2023 · 18 comments
Assignees

Comments

@ZachHoppinen
Copy link
Contributor

No description provided.

@ZachHoppinen
Copy link
Contributor Author

Progress - created download_s1_imgs() function that works with notebook based testing. Wrote basic user input error tests and simple functionality.

TODO: Write more tests for other cases

@ZachHoppinen
Copy link
Contributor Author

ZachHoppinen commented Feb 8, 2023

#TODO - add in relative orbit, satellite A vs. B, am vs. pm flight paths as dimensions to slice.

probably use this code as a starting point:

line 145 in https://github.com/SnowEx/spicy-snow/blob/main/spicy_snow/download/sentinel1.py

img = img.assign_coords(time = pd.to_datetime(granule.split('_')[4]))

The function already have granule name and should be able to satellite:
https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar/naming-conventions

for relative orbit from granule:
https://gis.stackexchange.com/questions/237116/sentinel-1-relative-orbit#:~:text=You%20actually%20can%20find%20the,%2D%2073%2C%20175)%20%2B%201

for am/pm or ascending/descending:
this is the trickiest one, can't get it from granule name because relative orbit can be either. Could look at times and convert to local times? Or search granule from asf_search (~10 seconds per search)? Or refactor code to bring this information from search results?

@gbrencher
Copy link
Collaborator

I just added flight direction, relative orbit, and platform (S1A or S1B) as coordinates to the S1 xarray dataset returned by the sentinel1.py download script.

A couple notes—there are a few options for getting flight direction. I initially looked at getting it from the hyp3 metadata, but it’s not included (maybe something to bug ASF about). Then I looked at getting it from the ASF metadata accompanying the original GRD scenes. This works, but it requires us to download and parse xml files for every scene. The way I ended up doing it is just using the ASF api to search each granule with the keyword ‘Ascending’, setting the flight direction to ascending if the search returns results, and descending otherwise. It’s a little hacky but it’s quick and doesn’t require us to download anything extra. I don’t think it will break unless there’s a problem with the actual xml metadata or the search is down, in which case the script won’t work anyway.

I included the descriptors as coordinates attached to the time dimension. We can swap them to a dimension if need be. I checked the new descriptors against the scenes in the ASF GUI search, looks like everything is working so far.

@ZachHoppinen
Copy link
Contributor Author

@gbrencher

I am playing around trying to select just the relative orbit 20 or just the ascending images and it is take a really long time because I can't use the xarray native .sel but using the non-native .where.

# xarray native 
ds.sel(relative_orbit = 20)

#where function
ds.where(ds.relative_orbit == 20, drop = True) # 3 minutes 20 seconds

If I make an xarray index for this it greatly speeds up this function.

ds = ds.set_xindex('relative_orbit') # 0.0 seconds
ds.sel(relative_orbit = 20) # 15.2 seconds

Do you know if we can either create the index in the download function right after making the coordinate? Or if there is a way to do it all at once?

@ZachHoppinen
Copy link
Contributor Author

Wow, just tried to add this to the sentinel1.py download script and it was ugly. Looks like xarray really doesn't want multiple indexes for the same dimension (time). Probably best to make these indexes right before we use them to subset in pre-processing and then remove them.

@gbrencher
Copy link
Collaborator

Hey Zach, I was struggling with this too; I kind of punted on it and just included the descriptors as non-dimension coordinates. A couple things to try--you can use the native .sel to index the ds like this: ds.sel(time = ds.relative_orbit == 20). You can also swap relative orbit to a dimension coordinate then swap it back. I'm not sure whether these solutions are faster than creating and deleting an index but I'd assume at least the .sel one is. Happy to test later this morning!

@ZachHoppinen
Copy link
Contributor Author

ZachHoppinen commented Feb 14, 2023

I didn't now about that ds.sel(time = ds.relative_orbit == 20) command. That is probably the way to go for now.

I am gonna ask Emma when I talk to her in a few days for her thoughts. Maybe she knows something slick.

%timeit ds.sel(time = ds.relative_orbit == 20)
# 3.65 s ± 1.36 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
ds1 = ds.set_xindex('relative_orbit')
%timeit ds1.sel(relative_orbit = 20)
# 2.54 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Doesn't seem to be a big difference so probably best to just stick with that

@gbrencher
Copy link
Collaborator

Sounds good!

@e-marshall
Copy link
Collaborator

Hey all! I still need to take a look at this more closely but in my experience (and in asking similar questions of ppl with more xarray experience) it seems that people often get similar things to work by breaking into multiple steps so probably similar to what you're doing, rather than a one-liner, I will dive into this more this afternoon though.

If its still of interest/useful, I have some code for accessing metadata from the ASF metadata file and formatting it in an array object here

@gbrencher
Copy link
Collaborator

Hey Emma, thanks for this! The way you're using the acquisition time to get flight direction is definitely better than what I did, I'll update the script.

@e-marshall
Copy link
Collaborator

Oh wait! Sorry, I just remembered that the acq time -> flight direction in that notebook I think is specific to the overpass times of the region I was looking at (HMA) and might be somewhat hardcoded. The principle should still work for any location, but might not be the best approach for a location-agnostic approach. I can try to look into if there's a better way to access that metadata but your approach might be better

@gbrencher
Copy link
Collaborator

Oh yeah I see that now and remember why I didn't do it that way haha. Really feels like flight direction should be in the Hyp3 metadata.

@e-marshall
Copy link
Collaborator

e-marshall commented Feb 14, 2023

it really does!
On a semi-related note, has there been any discussion of formatting data (both downloaded S1 and processed snow depths) in the STAC specification? I'm still learning my way around STAC so not 100% it is the right direction here but it seems like it could be really helpful. Among other things, I think it might help with memory issues that may be encountered reading large stacks of data by being able to use pystac and stackstac tools.

I just formatted a dataset I'm beginning to work with as a STAC catalog (with the help of this great example and so far it has made file read steps much smoother (just did this this am so haven't fully explored it yet). my understanding is that it creates an XML representation of the files similar to a GDAL VRT so that you can read in and clip to an AOI lazily, rather than needing to read a full scene in as an xarray object and trigger a dask compute step upon clipping. I think it could be a more streamlined way of keeping track of metadata as well

This might not be the best place to bring this up but I thought I'd post here and we can migrate elsewhere as needed.

@ZachHoppinen
Copy link
Contributor Author

ZachHoppinen commented Feb 14, 2023

That is a super nice tutorial on it. So the main advantages of the STAC specification are that each file would be a .json that we could open individual files instead of the whole xarray netcdf stack?

What would it look like to set up our downloads to match that specification emma?

I suspect we won't need that for this since we are going to coarsen our resolution to 300 or 1km resolution but if it isn't a ton of work to do at the start it might be something to keep in mind so we could run all of western NA someday.

We can talk about it with everyone at the next meeting.

@e-marshall
Copy link
Collaborator

Sounds good! I still need to get up to speed on the full processing pipeline for this, so it might not be as necessary for the step of ingesting S1 data, but I was thinking it could be useful for users for the output data to be STAC-formatted as well.
My understanding is that each file would be a STAC item and would be organized into a catalog that's represented in a catalog.json file, and then you could query the catalog based on time period and area of interest, and only read the S1 data that satisfy those conditions as an xarray object (rather than the full ASF geotiff). This saves trying to clip a full-scene xarray object down to an AOI using the rioxarray.clip() function which isn't 'lazy' and can cause memory blow-ups.
In terms of setting up downloads to match that specification, it took me a bit of playing around but wasn't too bad; I can share an example notebook soon.

@gbrencher
Copy link
Collaborator

Just improved the granule metadata search as per Joe Kennedy's rec. Now we can easily include any scene properties we want from the source granule metadata. We also don't have to do any math to get relative orbit. Most properties are probably irrelevant but a couple may be of interest.
{'beamModeType': 'IW', 'browse': ['https://datapool.asf.alaska.edu/BROWSE/SA/S1A_IW_GRDH_1SDV_20191231T135003_20191231T135028_030593_03813A_A5DB.jpg'], 'bytes': 912582424, 'centerLat': 42.654, 'centerLon': -115.8438, 'faradayRotation': None, 'fileID': 'S1A_IW_GRDH_1SDV_20191231T135003_20191231T135028_030593_03813A_A5DB-GRD_HD', 'flightDirection': 'DESCENDING', 'groupID': 'S1A_IWDV_0449_0456_030593_071', 'granuleType': 'SENTINEL_1A_FRAME', 'insarStackId': None, 'md5sum': 'cb1cd63bc4ec48ae1d479cbd88aef8ef', 'offNadirAngle': None, 'orbit': 30593, 'pathNumber': 71, 'platform': 'Sentinel-1A', 'pointingAngle': None, 'polarization': 'VV+VH', 'processingDate': '2019-12-31T13:50:03.000Z', 'processingLevel': 'GRD_HD', 'sceneName': 'S1A_IW_GRDH_1SDV_20191231T135003_20191231T135028_030593_03813A_A5DB', 'sensor': 'C-SAR', 'startTime': '2019-12-31T13:50:03.000Z', 'stopTime': '2019-12-31T13:50:28.000Z', 'url': 'https://datapool.asf.alaska.edu/GRD_HD/SA/S1A_IW_GRDH_1SDV_20191231T135003_20191231T135028_030593_03813A_A5DB.zip', 'fileName': 'S1A_IW_GRDH_1SDV_20191231T135003_20191231T135028_030593_03813A_A5DB.zip', 'frameNumber': 450}

@ZachHoppinen
Copy link
Contributor Author

Nice! Thanks Quinn.

I just added a function to the pre-processing functions PR #22 that checks if two or more images in the dataset are within 10 minutes of each other and the same relative orbit and if they are combines them to fill 0s or nans. I noticed we had some time slices that were just arbitrary granule splits and could actually be combined.

I don't know if you want to incorporate that into the download script or leave it as a pre-processing step. Both probably work.

@gbrencher
Copy link
Collaborator

Great! That makes sense, those duplicate images are probably consecutive partially overlapping 'frames'. Where they chop those up is arbitrary (and changes geographically over time, which is not ideal--they'll be in the in the same location for NISAR iirc). Leaving that in the pre-pro is good with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants