# Import required libraries
import os
import folium
@@ -327,68 +330,453 @@ 2.1 Defi
For this example, our spatial region of interest (ROI) will be the Carpenteria Salt Marsh. You can learn more about it here: https://ucnrs.org/reserves/carpinteria-salt-marsh-reserve/. If you want to create a geojson polygon for your own ROI, you can do so using this website: https://geojson.io/#map=2/20/0, or you can convert a shapefile to a geojson using some code in the Appendices.
In this example, we elect to search using a polygon rather than a standard bounding box because bounding boxes will have a larger spatial extent, capturing a lot of area we may not be interested in. This becomes more important for searches with larger ROIs than our example here. To search for intersections with a polygon using earthaccess, we need to format our ROI as a counter-clockwise list of coordinate pairs.
Open the geojson
file containing a landcover classification of Carpenteria Salt Marsh as a geodataframe
, and check the coordinate reference system (CRS) of the data.
-
+
= gpd.read_file('../data/landcover.geojson')
polygon polygon.crs
+
+<Geographic 2D CRS: EPSG:4326>
+Name: WGS 84
+Axis Info [ellipsoidal]:
+- Lat[north]: Geodetic latitude (degree)
+- Lon[east]: Geodetic longitude (degree)
+Area of Use:
+- name: World.
+- bounds: (-180.0, -90.0, 180.0, 90.0)
+Datum: World Geodetic System 1984 ensemble
+- Ellipsoid: WGS 84
+- Prime Meridian: Greenwich
+
The CRS is EPSG:4326 (WGS84), which is also the CRS we want the data in to submit for our search.
Next, lets examine our polygon a bit closer.
-
- polygon
+
+ polygon
+
+
+
+
+
+
+
+
+type
+geometry
+
+
+
+
+0
+channel
+MULTIPOLYGON (((-119.54125 34.40462, -119.5412...
+
+
+1
+salt flat
+MULTIPOLYGON (((-119.52907 34.39633, -119.5290...
+
+
+2
+upland
+MULTIPOLYGON (((-119.54524 34.40555, -119.5452...
+
+
+3
+pan
+MULTIPOLYGON (((-119.52924 34.39675, -119.5292...
+
+
+4
+marsh
+MULTIPOLYGON (((-119.54162 34.40421, -119.5416...
+
+
+
+
+
+
We can see this geodataframe
consists of multiple classes, each containing a multipolygon within our study site. We need to create an exterior boundary polygon containing these, and make sure the vertices are in counter-clockwise order to submit them in our query. To do this, create a polygon consisting of all the geometries, then calculate the convex hull of the union. This will give us a simple exterior polygon around our full ROI. After that, use the orient
function to place our coordinate pairs in counter-clockwise order.
-
-# Merge all Polygon geometries and create external boundary
-= polygon.unary_union.convex_hull
- roi_poly # Re-order vertices to counter-clockwise
-= orient(roi_poly, sign=1.0) roi_poly
+
+# Merge all Polygon geometries and create external boundary
+= polygon.unary_union.convex_hull
+ roi_poly # Re-order vertices to counter-clockwise
+= orient(roi_poly, sign=1.0) roi_poly
We can go ahead and visualize our region of interest and the original landcover polygon. First add a function to help reformat bound box coordinates to work with leaflet notation.
-
-# Function to convert a bounding box for use in leaflet notation
-
-def convert_bounds(bbox, invert_y=False):
-"""
- Helper method for changing bounding box representation to leaflet notation
-
- ``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``
- """
-= bbox
- x1, y1, x2, y2 if invert_y:
- = y2, y1
- y1, y2 return ((y1, x1), (y2, x2))
+
+# Function to convert a bounding box for use in leaflet notation
+
+def convert_bounds(bbox, invert_y=False):
+"""
+ Helper method for changing bounding box representation to leaflet notation
+
+ ``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``
+ """
+= bbox
+ x1, y1, x2, y2 if invert_y:
+ = y2, y1
+ y1, y2 return ((y1, x1), (y2, x2))
Then create a figure using folium
.
-
-= Figure(width="800px", height="400px")
- fig = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
- map1
- fig.add_child(map1)
-# Add Convex Hull Polygon
-
- folium.GeoJson(roi_poly,='convex hull',
- name
- ).add_to(map1)
-# Add landcover classification geodataframe
-
- polygon.explore("type",
- =True,
- popup=True,
- categorical='Set3',
- cmap=dict(opacity=0.7, fillOpacity=0.4),
- style_kwds="Carpenteria Salt Marsh Landcover",
- name=map1
- m
- )
-
- map1.add_child(folium.LayerControl())=convert_bounds(polygon.unary_union.bounds))
- map1.fit_bounds(bounds display(fig)
+
+= Figure(width="800px", height="400px")
+ fig = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
+ map1
+ fig.add_child(map1)
+# Add Convex Hull Polygon
+
+ folium.GeoJson(roi_poly,='convex hull',
+ name
+ ).add_to(map1)
+# Add landcover classification geodataframe
+
+ polygon.explore("type",
+ =True,
+ popup=True,
+ categorical='Set3',
+ cmap=dict(opacity=0.7, fillOpacity=0.4),
+ style_kwds="Carpenteria Salt Marsh Landcover",
+ name=map1
+ m
+ )
+
+ map1.add_child(folium.LayerControl())=convert_bounds(polygon.unary_union.bounds))
+ map1.fit_bounds(bounds display(fig)
+
+
+
+
Above we can see our region of interest (ROI) and the landcover classification polygon that we opened. We can hover over different areas to see the land cover class.
Lastly we need to convert our polygon to a list of coordinate pairs.
-
-# Set ROI as list of exterior polygon vertices as coordinate pairs
-= list(roi_poly.exterior.coords) roi
+
+# Set ROI as list of exterior polygon vertices as coordinate pairs
+= list(roi_poly.exterior.coords) roi
@@ -397,32 +785,38 @@ 2.2 Define
Note: Here we use the Tiled ECOSTRESS LSTE Product. This will also work with the gridded LSTE and the swath; however, the swath product does not have a browse image for the visualization in section 4, and will require additional processing for subsequent analysis.
-
-# Data Collections for our search
-= ['EMITL2ARFL', 'ECO_L2T_LSTE'] collections
+
+# Data Collections for our search
+= ['EMITL2ARFL', 'ECO_L2T_LSTE'] collections
2.3 Define Date Range
For our date range, we’ll look at data collected across the month of April 2023. The date_range
can be specified as a pair of dates, start and end (up to, not including).
-
-# Define Date Range
-= ('2023-01-01','2023-09-01') date_range
+
+# Define Date Range
+= ('2023-01-01','2023-09-01') date_range
2.4 Searching
Submit a query using earthaccess
.
-
-= earthaccess.search_data(
- results =collections,
- short_name=roi,
- polygon=date_range,
- temporal=500
- count )
+
+= earthaccess.search_data(
+ results =collections,
+ short_name=roi,
+ polygon=date_range,
+ temporal=500
+ count )
+
+Granules found: 119
+
+
+
+len(results)
+
+119
-
-len(results)
@@ -430,235 +824,1372 @@ 2.4 Searching
3. Organizing and Filtering Results
As we can see from above, the results object contains a list of objects with metadata and links. We can convert this to a more readable format, a dataframe. In addition, we can make it a geodataframe by taking the spatial metadata and creating a shapely polygon representing the spatial coverage, and further customize which information we want to use from other metadata fields.
First, we define some functions to help us create a shapely object for our geodataframe, and retrieve the specific browse image URLs that we want. By default the browse image selected by earthaccess
is the first one in the list, but the ECO_L2_LSTE has several browse images and we want to make sure we retrieve the png
file, which is a preview of the LSTE.
-
-# Function to create shapely polygon of spatial coverage
-def get_shapely_object(result:earthaccess.results.DataGranule):
-# Get Geometry Keys
- = result['umm']['SpatialExtent']['HorizontalSpatialDomain']['Geometry']
- geo = geo.keys()
- keys
-if 'BoundingRectangles' in keys:
- = geo['BoundingRectangles'][0]
- bounding_rectangle # Create bbox tuple
- = (bounding_rectangle['WestBoundingCoordinate'],bounding_rectangle['SouthBoundingCoordinate'],
- bbox_coords 'EastBoundingCoordinate'],bounding_rectangle['NorthBoundingCoordinate'])
- bounding_rectangle[# Create shapely geometry from bbox
- = geometry.box(*bbox_coords, ccw=True)
- shape elif 'GPolygons' in keys:
- = geo['GPolygons'][0]['Boundary']['Points']
- points # Create shapely geometry from polygons
- = geometry.Polygon([[p['Longitude'],p['Latitude']] for p in points])
- shape else:
- raise ValueError('Provided result does not contain bounding boxes/polygons or is incompatible.')
- return(shape)
-
-# Retrieve png browse image if it exists or first jpg in list of urls
-def get_png(result:earthaccess.results.DataGranule):
-= [link for link in result.dataviz_links() if 'https' in link]
- https_links if len(https_links) == 1:
- = https_links[0]
- browse elif len(https_links) == 0:
- = 'no browse image'
- browse f"There is no browse imagery for {result['umm']['GranuleUR']}.")
- warnings.warn(else:
- = [png for png in https_links if '.png' in png][0]
- browse return(browse)
+
+# Function to create shapely polygon of spatial coverage
+def get_shapely_object(result:earthaccess.results.DataGranule):
+# Get Geometry Keys
+ = result['umm']['SpatialExtent']['HorizontalSpatialDomain']['Geometry']
+ geo = geo.keys()
+ keys
+if 'BoundingRectangles' in keys:
+ = geo['BoundingRectangles'][0]
+ bounding_rectangle # Create bbox tuple
+ = (bounding_rectangle['WestBoundingCoordinate'],bounding_rectangle['SouthBoundingCoordinate'],
+ bbox_coords 'EastBoundingCoordinate'],bounding_rectangle['NorthBoundingCoordinate'])
+ bounding_rectangle[# Create shapely geometry from bbox
+ = geometry.box(*bbox_coords, ccw=True)
+ shape elif 'GPolygons' in keys:
+ = geo['GPolygons'][0]['Boundary']['Points']
+ points # Create shapely geometry from polygons
+ = geometry.Polygon([[p['Longitude'],p['Latitude']] for p in points])
+ shape else:
+ raise ValueError('Provided result does not contain bounding boxes/polygons or is incompatible.')
+ return(shape)
+
+# Retrieve png browse image if it exists or first jpg in list of urls
+def get_png(result:earthaccess.results.DataGranule):
+= [link for link in result.dataviz_links() if 'https' in link]
+ https_links if len(https_links) == 1:
+ = https_links[0]
+ browse elif len(https_links) == 0:
+ = 'no browse image'
+ browse f"There is no browse imagery for {result['umm']['GranuleUR']}.")
+ warnings.warn(else:
+ = [png for png in https_links if '.png' in png][0]
+ browse return(browse)
Now that we have our functions we can create a dataframe, then calculate and add our shapely geometries to make a geodataframe. After that, add a column for our browse image urls and print the number of granules in our results, so we can monitor the quantity we are working with a we winnow down to the data we want.
-
-# Create Dataframe of Results Metadata
-= pd.json_normalize(results)
- results_df # Create shapely polygons for result
-= [get_shapely_object(results[index]) for index in results_df.index.to_list()]
- geometries # Convert to GeoDataframe
-= gpd.GeoDataFrame(results_df, geometry=geometries, crs="EPSG:4326")
- gdf # Remove results df, no longer needed
-del results_df
-# Add browse imagery links
-'browse'] = [get_png(granule) for granule in results]
- gdf['shortname'] = [result['umm']['CollectionReference']['ShortName'] for result in results]
- gdf[# Preview GeoDataframe
-print(f'{gdf.shape[0]} granules total')
+
+# Create Dataframe of Results Metadata
+= pd.json_normalize(results)
+ results_df # Create shapely polygons for result
+= [get_shapely_object(results[index]) for index in results_df.index.to_list()]
+ geometries # Convert to GeoDataframe
+= gpd.GeoDataFrame(results_df, geometry=geometries, crs="EPSG:4326")
+ gdf # Remove results df, no longer needed
+del results_df
+# Add browse imagery links
+'browse'] = [get_png(granule) for granule in results]
+ gdf['shortname'] = [result['umm']['CollectionReference']['ShortName'] for result in results]
+ gdf[# Preview GeoDataframe
+print(f'{gdf.shape[0]} granules total')
+
+119 granules total
+
Preview our geodataframe to get an idea what it looks like.
-
- gdf.head()
+
+ gdf.head()
+
+
+
+
+
+
+
+
+size
+meta.concept-type
+meta.concept-id
+meta.revision-id
+meta.native-id
+meta.provider-id
+meta.format
+meta.revision-date
+umm.TemporalExtent.RangeDateTime.BeginningDateTime
+umm.TemporalExtent.RangeDateTime.EndingDateTime
+...
+umm.Platforms
+umm.MetadataSpecification.URL
+umm.MetadataSpecification.Name
+umm.MetadataSpecification.Version
+umm.SpatialExtent.HorizontalSpatialDomain.Geometry.GPolygons
+umm.PGEVersionClass.PGEName
+umm.CloudCover
+geometry
+browse
+shortname
+
+
+
+
+0
+2.49626
+granule
+G2581836170-LPCLOUD
+1
+ECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...
+LPCLOUD
+application/echo10+xml
+2023-01-07T13:31:07.584Z
+2023-01-01T15:52:48.650Z
+2023-01-01T15:53:40.620Z
+...
+[{'ShortName': 'ISS', 'Instruments': [{'ShortN...
+https://cdn.earthdata.nasa.gov/umm/granule/v1.6.5
+UMM-G
+1.6.5
+NaN
+NaN
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+1
+8.49543
+granule
+G2586136993-LPCLOUD
+1
+ECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...
+LPCLOUD
+application/echo10+xml
+2023-01-10T22:21:57.631Z
+2023-01-03T07:44:22.480Z
+2023-01-03T07:45:14.450Z
+...
+[{'ShortName': 'ISS', 'Instruments': [{'ShortN...
+https://cdn.earthdata.nasa.gov/umm/granule/v1.6.5
+UMM-G
+1.6.5
+NaN
+NaN
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+2
+1.16068
+granule
+G2591892077-LPCLOUD
+1
+ECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...
+LPCLOUD
+application/echo10+xml
+2023-01-18T19:13:09.017Z
+2023-01-07T06:07:21.560Z
+2023-01-07T06:08:13.530Z
+...
+[{'ShortName': 'ISS', 'Instruments': [{'ShortN...
+https://cdn.earthdata.nasa.gov/umm/granule/v1.6.5
+UMM-G
+1.6.5
+NaN
+NaN
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+3
+3.42404
+granule
+G2592560828-LPCLOUD
+1
+ECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...
+LPCLOUD
+application/echo10+xml
+2023-01-19T11:23:08.273Z
+2023-01-11T04:30:26.510Z
+2023-01-11T04:31:18.480Z
+...
+[{'ShortName': 'ISS', 'Instruments': [{'ShortN...
+https://cdn.earthdata.nasa.gov/umm/granule/v1.6.5
+UMM-G
+1.6.5
+NaN
+NaN
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+4
+14.28010
+granule
+G2593132604-LPCLOUD
+1
+ECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...
+LPCLOUD
+application/echo10+xml
+2023-01-20T05:37:00.651Z
+2023-01-12T11:50:28.830Z
+2023-01-12T11:51:20.800Z
+...
+[{'ShortName': 'ISS', 'Instruments': [{'ShortN...
+https://cdn.earthdata.nasa.gov/umm/granule/v1.6.5
+UMM-G
+1.6.5
+NaN
+NaN
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+
+
+5 rows × 34 columns
+
+
There are a lot of columns with data that is not relevant to our goal, so we can drop those. To do that, list the names of colums.
-
-# List Column Names
- gdf.columns
+
+# List Column Names
+ gdf.columns
+
+Index(['size', 'meta.concept-type', 'meta.concept-id', 'meta.revision-id',
+ 'meta.native-id', 'meta.provider-id', 'meta.format',
+ 'meta.revision-date',
+ 'umm.TemporalExtent.RangeDateTime.BeginningDateTime',
+ 'umm.TemporalExtent.RangeDateTime.EndingDateTime',
+ 'umm.OrbitCalculatedSpatialDomains', 'umm.GranuleUR',
+ 'umm.AdditionalAttributes', 'umm.MeasuredParameters',
+ 'umm.SpatialExtent.HorizontalSpatialDomain.Geometry.BoundingRectangles',
+ 'umm.ProviderDates', 'umm.CollectionReference.ShortName',
+ 'umm.CollectionReference.Version', 'umm.PGEVersionClass.PGEVersion',
+ 'umm.RelatedUrls', 'umm.DataGranule.DayNightFlag',
+ 'umm.DataGranule.Identifiers', 'umm.DataGranule.ProductionDateTime',
+ 'umm.DataGranule.ArchiveAndDistributionInformation', 'umm.Platforms',
+ 'umm.MetadataSpecification.URL', 'umm.MetadataSpecification.Name',
+ 'umm.MetadataSpecification.Version',
+ 'umm.SpatialExtent.HorizontalSpatialDomain.Geometry.GPolygons',
+ 'umm.PGEVersionClass.PGEName', 'umm.CloudCover', 'geometry', 'browse',
+ 'shortname'],
+ dtype='object')
+
Now create a list of columns to keep and use it to filter the dataframe.
-
-# Create a list of columns to keep
-= ['meta.concept-id','meta.native-id', 'umm.TemporalExtent.RangeDateTime.BeginningDateTime','umm.TemporalExtent.RangeDateTime.EndingDateTime','umm.CloudCover','umm.DataGranule.DayNightFlag','geometry','browse', 'shortname']
- keep_cols # Remove unneeded columns
-= gdf[gdf.columns.intersection(keep_cols)]
- gdf gdf.head()
+
+# Create a list of columns to keep
+= ['meta.concept-id','meta.native-id', 'umm.TemporalExtent.RangeDateTime.BeginningDateTime','umm.TemporalExtent.RangeDateTime.EndingDateTime','umm.CloudCover','umm.DataGranule.DayNightFlag','geometry','browse', 'shortname']
+ keep_cols # Remove unneeded columns
+= gdf[gdf.columns.intersection(keep_cols)]
+ gdf gdf.head()
+
+
+
+
+
+
+
+
+meta.concept-id
+meta.native-id
+umm.TemporalExtent.RangeDateTime.BeginningDateTime
+umm.TemporalExtent.RangeDateTime.EndingDateTime
+umm.DataGranule.DayNightFlag
+umm.CloudCover
+geometry
+browse
+shortname
+
+
+
+
+0
+G2581836170-LPCLOUD
+ECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...
+2023-01-01T15:52:48.650Z
+2023-01-01T15:53:40.620Z
+Day
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+1
+G2586136993-LPCLOUD
+ECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...
+2023-01-03T07:44:22.480Z
+2023-01-03T07:45:14.450Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+2
+G2591892077-LPCLOUD
+ECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...
+2023-01-07T06:07:21.560Z
+2023-01-07T06:08:13.530Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+3
+G2592560828-LPCLOUD
+ECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...
+2023-01-11T04:30:26.510Z
+2023-01-11T04:31:18.480Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+4
+G2593132604-LPCLOUD
+ECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...
+2023-01-12T11:50:28.830Z
+2023-01-12T11:51:20.800Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+
+
+
+
This is looking better, but we can make it more readable by renaming our columns.
-
-# Rename some Columns
-= {'meta.concept-id':'concept_id','meta.native-id':'granule',
- gdf.rename(columns 'umm.TemporalExtent.RangeDateTime.BeginningDateTime':'start_datetime',
- 'umm.TemporalExtent.RangeDateTime.EndingDateTime':'end_datetime',
- 'umm.CloudCover':'cloud_cover',
- 'umm.DataGranule.DayNightFlag':'day_night'}, inplace=True)
- gdf.head()
+
+# Rename some Columns
+= {'meta.concept-id':'concept_id','meta.native-id':'granule',
+ gdf.rename(columns 'umm.TemporalExtent.RangeDateTime.BeginningDateTime':'start_datetime',
+ 'umm.TemporalExtent.RangeDateTime.EndingDateTime':'end_datetime',
+ 'umm.CloudCover':'cloud_cover',
+ 'umm.DataGranule.DayNightFlag':'day_night'}, inplace=True)
+ gdf.head()
+
+
+
+
+
+
+
+
+concept_id
+granule
+start_datetime
+end_datetime
+day_night
+cloud_cover
+geometry
+browse
+shortname
+
+
+
+
+0
+G2581836170-LPCLOUD
+ECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...
+2023-01-01T15:52:48.650Z
+2023-01-01T15:53:40.620Z
+Day
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+1
+G2586136993-LPCLOUD
+ECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...
+2023-01-03T07:44:22.480Z
+2023-01-03T07:45:14.450Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+2
+G2591892077-LPCLOUD
+ECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...
+2023-01-07T06:07:21.560Z
+2023-01-07T06:08:13.530Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+3
+G2592560828-LPCLOUD
+ECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...
+2023-01-11T04:30:26.510Z
+2023-01-11T04:31:18.480Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+4
+G2593132604-LPCLOUD
+ECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...
+2023-01-12T11:50:28.830Z
+2023-01-12T11:51:20.800Z
+Night
+NaN
+POLYGON ((-119.06582 34.21003, -119.06582 35.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+ECO_L2T_LSTE
+
+
+
+
+
+
Note: If querying on-premises (not cloud) LP DAAC datasets, the meta.concept-id
will not show as xxxxxx-LPCLOUD
. For these datasets, the granule name can be retrieved from the umm.DataGranule.Identifiers
column.
We can filter using the day/night flag as well, but this step will be unnecessary as we check to ensure all results from ECOSTRESS fall within an hour of resulting EMIT granules.
-
-# gdf = gdf[gdf['day_night'].str.contains('Day')]
+
+# gdf = gdf[gdf['day_night'].str.contains('Day')]
Our first step toward filtering the datasets will be to add a column with a datetime
.
You may have noticed that the date format is similar for ECOSTRESS and EMIT, but the ECOSTRESS data has an additional fractional seconds. If using the recommended lpdaac_vitals
Windows environment, you will need to pass the format='ISO8601'
argument to the to_datetime
function, like in the commented out line.
-
-'datetime_obj'] = pd.to_datetime(gdf['start_datetime'])
- gdf[# gdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'], format='ISO8601')
+
+# gdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'])
+'datetime_obj'] = pd.to_datetime(gdf['start_datetime'], format='ISO8601') gdf[
We can roughly visualize the quantity of results by month at our location using a histogram with 8 bins (Jan up to Sept).
-
-='datetime_obj', by='shortname', bins=9, color='green', edgecolor='black', linewidth=1, sharey=True) gdf.hist(column
+
+='datetime_obj', by='shortname', bins=9, color='green', edgecolor='black', linewidth=1, sharey=True) gdf.hist(column
+
+array([<Axes: title={'center': 'ECO_L2T_LSTE'}>,
+ <Axes: title={'center': 'EMITL2ARFL'}>], dtype=object)
+
+
+
+
Now we will separate the results into two dataframes, one for ECOTRESS and one for EMIT and print the number of results for each so we can monitor how many granules we’re filtering.
-
-# Suppress Setting with Copy Warning - not applicable in this use case
-= None # default='warn'
- pd.options.mode.chained_assignment
-# Split into two dataframes - ECO and EMIT
-= gdf[gdf['granule'].str.contains('ECO')]
- eco_gdf = gdf[gdf['granule'].str.contains('EMIT')]
- emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
-
-
- emit_gdf.head()
+
+# Suppress Setting with Copy Warning - not applicable in this use case
+= None # default='warn'
+ pd.options.mode.chained_assignment
+# Split into two dataframes - ECO and EMIT
+= gdf[gdf['granule'].str.contains('ECO')]
+ eco_gdf = gdf[gdf['granule'].str.contains('EMIT')]
+ emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+ ECOSTRESS Granules: 111
+ EMIT Granules: 8
+
+
+
+ emit_gdf.head()
+
+
+
+
+
+
+
+
+concept_id
+granule
+start_datetime
+end_datetime
+day_night
+cloud_cover
+geometry
+browse
+shortname
+datetime_obj
+
+
+
+
+20
+G2631045418-LPCLOUD
+EMIT_L2A_RFL_001_20230219T202951_2305013_003
+2023-02-19T20:29:51Z
+2023-02-19T20:30:14Z
+Day
+67.0
+POLYGON ((-120.04838 35.03646, -120.54523 34.4...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+EMITL2ARFL
+2023-02-19 20:29:51+00:00
+
+
+23
+G2631458885-LPCLOUD
+EMIT_L2A_RFL_001_20230223T185548_2305412_004
+2023-02-23T18:55:48Z
+2023-02-23T18:56:00Z
+Day
+96.0
+POLYGON ((-119.95147 35.08163, -120.44685 34.4...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+EMITL2ARFL
+2023-02-23 18:55:48+00:00
+
+
+27
+G2631818047-LPCLOUD
+EMIT_L2A_RFL_001_20230227T172140_2305811_006
+2023-02-27T17:21:40Z
+2023-02-27T17:21:52Z
+Day
+100.0
+POLYGON ((-119.41012 34.81982, -119.90931 34.1...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+EMITL2ARFL
+2023-02-27 17:21:40+00:00
+
+
+47
+G2667368816-LPCLOUD
+EMIT_L2A_RFL_001_20230405T190311_2309513_002
+2023-04-05T19:03:11Z
+2023-04-05T19:03:23Z
+Day
+71.0
+POLYGON ((-119.87804 34.90691, -120.65480 34.2...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+EMITL2ARFL
+2023-04-05 19:03:11+00:00
+
+
+48
+G2667369196-LPCLOUD
+EMIT_L2A_RFL_001_20230405T190323_2309513_003
+2023-04-05T19:03:23Z
+2023-04-05T19:03:35Z
+Day
+8.0
+POLYGON ((-119.22086 35.41499, -120.01088 34.8...
+https://data.lpdaac.earthdatacloud.nasa.gov/lp...
+EMITL2ARFL
+2023-04-05 19:03:23+00:00
+
+
+
+
+
+
We still haven’t filtered the locations where EMIT and ECOSTRESS have data at the same spatial location and time-frame. The EMIT acquisition mask has been added to ECOSTRESS, so in most cases if EMIT is collecting data, so will ECOSTRESS, but there are edge cases where this is not true. To do this we’ll use two filters to catch the edge-cases, and provide an example that can be used with other datasets.
First, since EMIT has a smaller swath width, we can can use a unary union of the spatial coverage present in our geodataframe to filter out ecostress granules that do not overlap with it.
-
-# Subset ECOSTRESS Granules in Geodataframe by intersection with EMIT granules
-## Create new column based on intersection with union of EMIT polygons.
-'intersects'] = eco_gdf.intersects(emit_gdf.unary_union)
- eco_gdf[## Apply subsetting
-= eco_gdf[eco_gdf['intersects'] == True]
- eco_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+# Subset ECOSTRESS Granules in Geodataframe by intersection with EMIT granules
+## Create new column based on intersection with union of EMIT polygons.
+'intersects'] = eco_gdf.intersects(emit_gdf.unary_union)
+ eco_gdf[## Apply subsetting
+= eco_gdf[eco_gdf['intersects'] == True]
+ eco_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+ ECOSTRESS Granules: 111
+ EMIT Granules: 8
+
In this instance, our results aren’t narrowed because our region of interest is smaller than a single EMIT scene. If the spatial ROI was very large, this would be much more unlikely.
Additionally, we want to make sure that data in our results are collected at the same time. For EMIT and ECOSTRESS, the EMIT acquisition mask has been added to the ECOSTRESS mask, meaning that if there is an EMIT scene, there should also be an ECOSTRESS scene acquired at the same time. In practice, however, the timestamps on the scenes can vary slightly. In order to capture this slight variability, we need to use a range instead of a single timestamp to capture concurrent data. To do this, we’ll ensure all ECOSTRESS granule start times fall within 10 minutes of any of the EMIT granules in our results, and vice-versa.
Write a function to evaluate whether these datetime
objects fall within 10 minutes of one another using the timedelta
function.
-
-# Function to Filter timestamps that do not fall within a time_delta of timestamps from the other acquisition time
-def concurrent_match(gdf_a:pd.DataFrame, gdf_b:pd.DataFrame, col_name:str, time_delta:timedelta):
-"""
- Cross references dataframes containing a datetime object column and keeps rows in
- each that fall within the provided timedelta of the other. Acceptable time_delta examples:
-
- months=1
- days=1
- hours=1
- minutes=1
- seconds=1
-
- """
-# Match Timestamps from Dataframe A with Time-range of entries in Dataframe B
- # Create empty list
- = []
- a_list # Iterate results for product a based on index values
- for _n in gdf_b.index.to_list():
- # Find where product b is within the window of each product a result
- = (gdf_a[col_name] > gdf_b[col_name][_n]-time_delta) & (gdf_a[col_name] < gdf_b[col_name][_n]+time_delta)
- a_matches # Append list with values
-
- a_list.append(a_matches)# Match Timestamps from Dataframe B with Time-range of entries in Dataframe A
- # Create empy list
- =[]
- b_list for _m in gdf_a.index.to_list():
- # Find where product a is within the window of each product b result
- = (gdf_b[col_name] > gdf_a[col_name][_m]-time_delta) & (gdf_b[col_name] < gdf_a[col_name][_m]+time_delta)
- b_matches # Append list with values
-
- b_list.append(b_matches)# Filter Original Dataframes by summing list of bools, 0 = outside of all time-ranges
- = gdf_a.loc[sum(a_list) > 0]
- a_filtered = gdf_b.loc[sum(b_list) > 0]
- b_filtered return(a_filtered, b_filtered)
+
+# Function to Filter timestamps that do not fall within a time_delta of timestamps from the other acquisition time
+def concurrent_match(gdf_a:pd.DataFrame, gdf_b:pd.DataFrame, col_name:str, time_delta:timedelta):
+"""
+ Cross references dataframes containing a datetime object column and keeps rows in
+ each that fall within the provided timedelta of the other. Acceptable time_delta examples:
+
+ months=1
+ days=1
+ hours=1
+ minutes=1
+ seconds=1
+
+ """
+# Match Timestamps from Dataframe A with Time-range of entries in Dataframe B
+ # Create empty list
+ = []
+ a_list # Iterate results for product a based on index values
+ for _n in gdf_b.index.to_list():
+ # Find where product b is within the window of each product a result
+ = (gdf_a[col_name] > gdf_b[col_name][_n]-time_delta) & (gdf_a[col_name] < gdf_b[col_name][_n]+time_delta)
+ a_matches # Append list with values
+
+ a_list.append(a_matches)# Match Timestamps from Dataframe B with Time-range of entries in Dataframe A
+ # Create empy list
+ =[]
+ b_list for _m in gdf_a.index.to_list():
+ # Find where product a is within the window of each product b result
+ = (gdf_b[col_name] > gdf_a[col_name][_m]-time_delta) & (gdf_b[col_name] < gdf_a[col_name][_m]+time_delta)
+ b_matches # Append list with values
+
+ b_list.append(b_matches)# Filter Original Dataframes by summing list of bools, 0 = outside of all time-ranges
+ = gdf_a.loc[sum(a_list) > 0]
+ a_filtered = gdf_b.loc[sum(b_list) > 0]
+ b_filtered return(a_filtered, b_filtered)
Now run our function.
-
-
-= concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(minutes=10))
- eco_gdf, emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+
+= concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(minutes=10))
+ eco_gdf, emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+ ECOSTRESS Granules: 5
+ EMIT Granules: 6
+
4. Visualizing Intersecting Coverage
Now that we have geodataframes containing some concurrent data, we can visualize them on a map using folium
. It’s often difficult to visualize a large time-series of scenes, so we’ve included an example in Appendix A1 on how to filter to a single day.
-
-# Plot Using Folium
-
-# Create Figure and Select Background Tiles
-= Figure(width="1100px", height="550px")
- fig = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
- map1
- fig.add_child(map1)
-# Plot STAC ECOSTRESS Results - note we must drop the datetime_obj columns for this to work
-=['datetime_obj']).explore(
- eco_gdf.drop(columns"granule",
- =True,
- categorical=[
- tooltip"granule",
- "start_datetime",
- "cloud_cover",
-
- ],=True,
- popup=dict(fillOpacity=0.1, width=2),
- style_kwds="ECOSTRESS",
- name=map1,
- m
- )
-# Plot STAC EMITL2ARFL Results - note we must drop the datetime_obj columns for this to work
-=['datetime_obj']).explore(
- emit_gdf.drop(columns"granule",
- =True,
- categorical=[
- tooltip"granule",
- "start_datetime",
- "cloud_cover",
-
- ],=True,
- popup=dict(fillOpacity=0.1, width=2),
- style_kwds="EMIT",
- name=map1,
- m
- )
-# ECOSTRESS Browse Images - Comment out to remove
-for _n in eco_gdf.index.to_list():
-
- folium.raster_layers.ImageOverlay(=eco_gdf['browse'][_n],
- image=eco_gdf['granule'][_n],
- name=[[eco_gdf.bounds['miny'][_n], eco_gdf.bounds['minx'][_n]], [eco_gdf.bounds['maxy'][_n], eco_gdf.bounds['maxx'][_n]]],
- bounds=False,
- interactive=False,
- cross_origin=0.75,
- opacity=1,
- zindex
- ).add_to(map1)
-# Plot Region of Interest
-
- polygon.explore(=False,
- popup=dict(fillOpacity=0.1, width=2),
- style_kwds="Region of Interest",
- name=map1
- m
- )
-=convert_bounds(gdf.unary_union.bounds))
- map1.fit_bounds(bounds
- map1.add_child(folium.LayerControl()) display(fig)
+
+# Plot Using Folium
+
+# Create Figure and Select Background Tiles
+= Figure(width="1100px", height="550px")
+ fig = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
+ map1
+ fig.add_child(map1)
+# Plot STAC ECOSTRESS Results - note we must drop the datetime_obj columns for this to work
+=['datetime_obj']).explore(
+ eco_gdf.drop(columns"granule",
+ =True,
+ categorical=[
+ tooltip"granule",
+ "start_datetime",
+ "cloud_cover",
+
+ ],=True,
+ popup=dict(fillOpacity=0.1, width=2),
+ style_kwds="ECOSTRESS",
+ name=map1,
+ m
+ )
+# Plot STAC EMITL2ARFL Results - note we must drop the datetime_obj columns for this to work
+=['datetime_obj']).explore(
+ emit_gdf.drop(columns"granule",
+ =True,
+ categorical=[
+ tooltip"granule",
+ "start_datetime",
+ "cloud_cover",
+
+ ],=True,
+ popup=dict(fillOpacity=0.1, width=2),
+ style_kwds="EMIT",
+ name=map1,
+ m
+ )
+# ECOSTRESS Browse Images - Comment out to remove
+for _n in eco_gdf.index.to_list():
+
+ folium.raster_layers.ImageOverlay(=eco_gdf['browse'][_n],
+ image=eco_gdf['granule'][_n],
+ name=[[eco_gdf.bounds['miny'][_n], eco_gdf.bounds['minx'][_n]], [eco_gdf.bounds['maxy'][_n], eco_gdf.bounds['maxx'][_n]]],
+ bounds=False,
+ interactive=False,
+ cross_origin=0.75,
+ opacity=1,
+ zindex
+ ).add_to(map1)
+# Plot Region of Interest
+
+ polygon.explore(=False,
+ popup=dict(fillOpacity=0.1, width=2),
+ style_kwds="Region of Interest",
+ name=map1
+ m
+ )
+=convert_bounds(gdf.unary_union.bounds))
+ map1.fit_bounds(bounds
+ map1.add_child(folium.LayerControl()) display(fig)
+
+
+
+
In the figure above, you can zoom in and out, click and drag to reposition the legend, and add or remove layers using the layer control in the top right. Notice that since we’re using the tiled ecostress product, we have 2 overlapping tiles at our ROI. You can visualize the tiles by adding or removing the layers.
@@ -667,36 +2198,46 @@ 4.2 Preview
Note: The black space is indicative of onboard cloud masking that occurs before data is downlinked from the ISS.
-
-= 3
- cols = math.ceil(len(emit_gdf)/cols)
- rows = plt.subplots(rows, cols, figsize=(20,20))
- fig, ax = ax.flatten()
- ax
-for _n, index in enumerate(emit_gdf.index.to_list()):
-= io.imread(emit_gdf['browse'][index])
- img
- ax[_n].imshow(img)f"Index: {index} - {emit_gdf['granule'][index]}")
- ax[_n].set_title('off')
- ax[_n].axis(
- plt.tight_layout() plt.show
+
+= 3
+ cols = math.ceil(len(emit_gdf)/cols)
+ rows = plt.subplots(rows, cols, figsize=(20,20))
+ fig, ax = ax.flatten()
+ ax
+for _n, index in enumerate(emit_gdf.index.to_list()):
+= io.imread(emit_gdf['browse'][index])
+ img
+ ax[_n].imshow(img)f"Index: {index} - {emit_gdf['granule'][index]}")
+ ax[_n].set_title('off')
+ ax[_n].axis(
+ plt.tight_layout() plt.show
+
+<function matplotlib.pyplot.show(close=None, block=None)>
+
+
+
+
4.3 Further Filtering
We can see that some of these granules likely won’t work because of the large amount of cloud cover, we can use a list of these to filter them out. Make a list of indexes to filter out.
-
-# Bad granule list
-= [27,74,87] bad_granules
+
+# Bad granule list
+= [27,74,87] bad_granules
Filter out the bad granules.
-
-= emit_gdf[~emit_gdf.index.isin(bad_granules)] emit_gdf
+
+= emit_gdf[~emit_gdf.index.isin(bad_granules)] emit_gdf
Now that we’ve narrowed our EMIT results we can again filter the ecostress granules based on their concurrency with our filtered EMIT granules.
-
-= concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(hours=1))
- eco_gdf, emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+= concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(hours=1))
+ eco_gdf, emit_gdf print(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \n EMIT Granules: {emit_gdf.shape[0]}')
+
+ ECOSTRESS Granules: 2
+ EMIT Granules: 3
+
@@ -705,59 +2246,75 @@
-= eco_gdf.index.to_list()+emit_gdf.index.to_list()
- keep_granules keep_granules.sort()
+
+= eco_gdf.index.to_list()+emit_gdf.index.to_list()
+ keep_granules keep_granules.sort()
Filter the results list.
-
-= [result for i, result in enumerate(results) if i in keep_granules] filtered_results
+
+= [result for i, result in enumerate(results) if i in keep_granules] filtered_results
Now we can download all of the associated assets, or retrieve the URLS and further filter them to specifically what we want.
First, log into Earthdata using the login
function from the earthaccess
library. The persist=True
argument will create a local .netrc
file if it doesn’t exist, or add your login info to an existing .netrc
file. If no Earthdata Login credentials are found in the .netrc
you’ll be prompted for them. As mentioned in section 1.2, this step is not necessary to conduct searches, but is needed to download or stream data.
-
-=True) earthaccess.login(persist
+
+=True) earthaccess.login(persist
+
+EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
+You're now authenticated with NASA Earthdata Login
+Using token with expiration date: 12/24/2023
+Using .netrc file for EDL
+
+
+<earthaccess.auth.Auth at 0x248568a2740>
+
Now we can download all assets using the following cell.
-
-# # Download All Assets for Granules in Filtered Results
-# earthaccess.download(filtered_results, '../data/')
+
+# # Download All Assets for Granules in Filtered Results
+# earthaccess.download(filtered_results, '../data/')
Or we can create a list of URLs and use that to further refine which files we download.
-
-# Retrieve URLS for Assets
-= [granule.data_links() for granule in filtered_results] results_urls
+
+# Retrieve URLS for Assets
+= [granule.data_links() for granule in filtered_results] results_urls
Granules often have several assets associated with them, for example, ECO_L2T_LSTE
has several assets: - Water Mask (water) - Cloud Mask (cloud) - Quality (QC) - Land Surface Temperature (LST) - Land Surface Temperature Error (LST_err) - Wide Band Emissivity (EmisWB) - Height (height)
The results list we just generated contains URLs to all of these files. We can further filter our results list using string matching to remove unwanted assets.
Create a list of strings and enumerate through our results_url list to filter out unwanted assets.
-
-= []
- filtered_asset_links # Pick Desired Assets (leave _ on RFL to distinguish from RFLUNC, LST. to distinguish from LST_err)
-= ['RFL_', 'LST.'] # Add more or do individually for reflectance, reflectance uncertainty, or mask
- desired_assets # Step through each sublist (granule) and filter based on desired assets.
-for n, granule in enumerate(results_urls):
-for url in granule:
- = url.split('/')[-1]
- asset_name if any(asset in asset_name for asset in desired_assets):
-
- filtered_asset_links.append(url) filtered_asset_links
+
+= []
+ filtered_asset_links # Pick Desired Assets (leave _ on RFL to distinguish from RFLUNC, LST. to distinguish from LST_err)
+= ['RFL_', 'LST.'] # Add more or do individually for reflectance, reflectance uncertainty, or mask
+ desired_assets # Step through each sublist (granule) and filter based on desired assets.
+for n, granule in enumerate(results_urls):
+for url in granule:
+ = url.split('/')[-1]
+ asset_name if any(asset in asset_name for asset in desired_assets):
+
+ filtered_asset_links.append(url) filtered_asset_links
+
+['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_26223_012_11SKU_20230219T202943_0710_01/ECOv002_L2T_LSTE_26223_012_11SKU_20230219T202943_0710_01_LST.tif',
+ 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230219T202951_2305013_003/EMIT_L2A_RFL_001_20230219T202951_2305013_003.nc',
+ 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_26921_001_11SKU_20230405T190258_0710_01/ECOv002_L2T_LSTE_26921_001_11SKU_20230405T190258_0710_01_LST.tif',
+ 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230405T190311_2309513_002/EMIT_L2A_RFL_001_20230405T190311_2309513_002.nc',
+ 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230405T190323_2309513_003/EMIT_L2A_RFL_001_20230405T190323_2309513_003.nc']
+
Uncomment the cell below (select all, then ctrl+/) and download the data that we’ve filtered.
-
-# # Get requests https Session using Earthdata Login Info
-# fs = earthaccess.get_requests_https_session()
-# # Retrieve granule asset ID from URL (to maintain existing naming convention)
-# for url in filtered_asset_links:
-# granule_asset_id = url.split('/')[-1]
-# # Define Local Filepath
-# fp = f'../data/{granule_asset_id}'
-# # Download the Granule Asset if it doesn't exist
-# if not os.path.isfile(fp):
-# with fs.get(url,stream=True) as src:
-# with open(fp,'wb') as dst:
-# for chunk in src.iter_content(chunk_size=64*1024*1024):
-# dst.write(chunk)
+
+# # Get requests https Session using Earthdata Login Info
+# fs = earthaccess.get_requests_https_session()
+# # Retrieve granule asset ID from URL (to maintain existing naming convention)
+# for url in filtered_asset_links:
+# granule_asset_id = url.split('/')[-1]
+# # Define Local Filepath
+# fp = f'../data/{granule_asset_id}'
+# # Download the Granule Asset if it doesn't exist
+# if not os.path.isfile(fp):
+# with fs.get(url,stream=True) as src:
+# with open(fp,'wb') as dst:
+# for chunk in src.iter_content(chunk_size=64*1024*1024):
+# dst.write(chunk)
Congratulations, now you you have downloaded concurrent data from the ECOSTRESS and EMIT instruments on the ISS.
@@ -776,29 +2333,29 @@ Appendices
A1. Further Limiting Search for Visualization Purposes
A large quantity of results may be difficult to understand when mapping with folium. We can create a subset that falls within a single day. First add another column of dates only, then find the unique dates.
-
-# eco_gdf['date'] = eco_gdf['start_datetime'].str.split('T').str[0]
-# emit_gdf['date'] = emit_gdf['start_datetime'].str.split('T').str[0]
-# emit_gdf['date'].unique()
+
+# eco_gdf['date'] = eco_gdf['start_datetime'].str.split('T').str[0]
+# emit_gdf['date'] = emit_gdf['start_datetime'].str.split('T').str[0]
+# emit_gdf['date'].unique()
Filter both sets of results using a single date.
-
-# single_day_eco = eco_gdf#[eco_gdf['date'] == '2023-04-23']
-# single_day_emit = emit_gdf#[emit_gdf['date'] == '2023-04-23']
-# print(f' ECOSTRESS Granules: {single_day_eco.shape[0]} \n EMIT Granules: {single_day_emit.shape[0]}')
+
+# single_day_eco = eco_gdf#[eco_gdf['date'] == '2023-04-23']
+# single_day_emit = emit_gdf#[emit_gdf['date'] == '2023-04-23']
+# print(f' ECOSTRESS Granules: {single_day_eco.shape[0]} \n EMIT Granules: {single_day_emit.shape[0]}')
A2. Convert Shapefile to GeoJSON
We can convert a shapefile to a geojson using the following cell. Note that we need to reorder the polygon external vertices so we can submit them as a list of points for our search.
-
-# # Use Sedgwick Reserve Shapefile
-# # Open Shapefile
-# polygon = gpd.read_file('../data/Sedgwick_Boundary/Sedgwick_Boundary.shp').to_crs("EPSG:4326")
-# # Reorder vertices into Counter-clockwise order
-# polygon.geometry[0] = orient(polygon.geometry[0], sign=1.0)
-# # Save as a geojson (not necessary)
-# polygon.to_file('../data/sedgwick_boundary_epsg4326.geojson', driver='GeoJSON')
+
+# # Use Sedgwick Reserve Shapefile
+# # Open Shapefile
+# polygon = gpd.read_file('../data/Sedgwick_Boundary/Sedgwick_Boundary.shp').to_crs("EPSG:4326")
+# # Reorder vertices into Counter-clockwise order
+# polygon.geometry[0] = orient(polygon.geometry[0], sign=1.0)
+# # Save as a geojson (not necessary)
+# polygon.to_file('../data/sedgwick_boundary_epsg4326.geojson', driver='GeoJSON')
@@ -1058,7 +2615,9 @@ A2. Conve
-
+
diff --git a/python/01_Finding_Concurrent_Data_files/figure-html/cell-21-output-2.png b/python/01_Finding_Concurrent_Data_files/figure-html/cell-21-output-2.png
new file mode 100644
index 0000000..26a8bfc
Binary files /dev/null and b/python/01_Finding_Concurrent_Data_files/figure-html/cell-21-output-2.png differ
diff --git a/python/01_Finding_Concurrent_Data_files/figure-html/cell-28-output-2.png b/python/01_Finding_Concurrent_Data_files/figure-html/cell-28-output-2.png
new file mode 100644
index 0000000..2fa00ac
Binary files /dev/null and b/python/01_Finding_Concurrent_Data_files/figure-html/cell-28-output-2.png differ
diff --git a/python/ECOSTRESS-EMIT_Carpinteria_Workshop.html b/python/ECOSTRESS-EMIT_Carpinteria_Workshop.html
index 9f6f479..aa0f2d0 100644
--- a/python/ECOSTRESS-EMIT_Carpinteria_Workshop.html
+++ b/python/ECOSTRESS-EMIT_Carpinteria_Workshop.html
@@ -742,7 +742,9 @@ Isolating
-
+
diff --git a/schedule.html b/schedule.html
index ac5f7bb..73595f5 100644
--- a/schedule.html
+++ b/schedule.html
@@ -536,7 +536,9 @@ Workshop Schedule
-
+
diff --git a/search.json b/search.json
index 4bf929e..e33adc6 100644
--- a/search.json
+++ b/search.json
@@ -249,28 +249,28 @@
"href": "python/01_Finding_Concurrent_Data.html#search-for-ecostress-and-emit-data",
"title": "01 Finding Concurrent ECOSTRESS and EMIT Data",
"section": "2. Search for ECOSTRESS and EMIT Data",
- "text": "2. Search for ECOSTRESS and EMIT Data\nBoth EMIT and ECOSTRESS products are hosted by the Land Processes Distributed Active Archive Center (LP DAAC). In this example we will use the cloud-hosted EMIT_L2A_RFL and ECOSTRESS_L2T_LSTE products available from the LP DAAC to find data. Any results we find for these products, should be available for other products within the EMIT and ECOSTRESS collections.\nTo find data we will use the earthaccess Python library. earthaccess searches NASA’s Common Metadata Repository (CMR), a metadata system that catalogs Earth Science data and associated metadata records. The results can then be used to download granules or generate lists granule search result URLs.\nUsing earthaccess we can search based on the attributes of a granule, which can be thought of as a spatiotemporal scene from an instrument containing multiple assets (eg. Reflectance, Reflectance Uncertainty, Masks for the EMIT L2A Reflectance Collection). We can search using attributes such as collection, acquisition time, and spatial footprint. This process can also be used with other EMIT or ECOSTRESS products, other collections, or different data providers, as well as across multiple catalogs with some modification.\n\n2.1 Define Spatial Region of Interest\nFor this example, our spatial region of interest (ROI) will be the Carpenteria Salt Marsh. You can learn more about it here: https://ucnrs.org/reserves/carpinteria-salt-marsh-reserve/. If you want to create a geojson polygon for your own ROI, you can do so using this website: https://geojson.io/#map=2/20/0, or you can convert a shapefile to a geojson using some code in the Appendices.\nIn this example, we elect to search using a polygon rather than a standard bounding box because bounding boxes will have a larger spatial extent, capturing a lot of area we may not be interested in. This becomes more important for searches with larger ROIs than our example here. To search for intersections with a polygon using earthaccess, we need to format our ROI as a counter-clockwise list of coordinate pairs.\nOpen the geojson file containing a landcover classification of Carpenteria Salt Marsh as a geodataframe, and check the coordinate reference system (CRS) of the data.\n\npolygon = gpd.read_file('../data/landcover.geojson')\npolygon.crs\n\nThe CRS is EPSG:4326 (WGS84), which is also the CRS we want the data in to submit for our search.\nNext, lets examine our polygon a bit closer.\n\npolygon\n\nWe can see this geodataframe consists of multiple classes, each containing a multipolygon within our study site. We need to create an exterior boundary polygon containing these, and make sure the vertices are in counter-clockwise order to submit them in our query. To do this, create a polygon consisting of all the geometries, then calculate the convex hull of the union. This will give us a simple exterior polygon around our full ROI. After that, use the orient function to place our coordinate pairs in counter-clockwise order.\n\n# Merge all Polygon geometries and create external boundary\nroi_poly = polygon.unary_union.convex_hull\n# Re-order vertices to counter-clockwise\nroi_poly = orient(roi_poly, sign=1.0)\n\nWe can go ahead and visualize our region of interest and the original landcover polygon. First add a function to help reformat bound box coordinates to work with leaflet notation.\n\n# Function to convert a bounding box for use in leaflet notation\n\ndef convert_bounds(bbox, invert_y=False):\n \"\"\"\n Helper method for changing bounding box representation to leaflet notation\n\n ``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``\n \"\"\"\n x1, y1, x2, y2 = bbox\n if invert_y:\n y1, y2 = y2, y1\n return ((y1, x1), (y2, x2))\n\nThen create a figure using folium.\n\nfig = Figure(width=\"800px\", height=\"400px\")\nmap1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')\nfig.add_child(map1)\n\n# Add Convex Hull Polygon\nfolium.GeoJson(roi_poly,\n name='convex hull',\n ).add_to(map1)\n\n# Add landcover classification geodataframe\npolygon.explore(\n \"type\",\n popup=True,\n categorical=True,\n cmap='Set3',\n style_kwds=dict(opacity=0.7, fillOpacity=0.4),\n name=\"Carpenteria Salt Marsh Landcover\",\n m=map1\n)\n\nmap1.add_child(folium.LayerControl())\nmap1.fit_bounds(bounds=convert_bounds(polygon.unary_union.bounds))\ndisplay(fig)\n\nAbove we can see our region of interest (ROI) and the landcover classification polygon that we opened. We can hover over different areas to see the land cover class.\nLastly we need to convert our polygon to a list of coordinate pairs.\n\n# Set ROI as list of exterior polygon vertices as coordinate pairs\nroi = list(roi_poly.exterior.coords)\n\n\n\n2.2 Define Collections of Interest\nWe need to specify which products we want to search for using their short-names. As mentioned above, we will conduct our search using the EMIT Level 2A Reflectance (EMITL2ARFL) and ECOSTRESS Level 2 Tiled Land Surface Temperature and Emmissivity (ECO_L2T_LSTE).\n\nNote: Here we use the Tiled ECOSTRESS LSTE Product. This will also work with the gridded LSTE and the swath; however, the swath product does not have a browse image for the visualization in section 4, and will require additional processing for subsequent analysis.\n\n\n# Data Collections for our search\ncollections = ['EMITL2ARFL', 'ECO_L2T_LSTE']\n\n\n\n2.3 Define Date Range\nFor our date range, we’ll look at data collected across the month of April 2023. The date_range can be specified as a pair of dates, start and end (up to, not including).\n\n# Define Date Range\ndate_range = ('2023-01-01','2023-09-01')\n\n\n\n2.4 Searching\nSubmit a query using earthaccess.\n\nresults = earthaccess.search_data(\n short_name=collections,\n polygon=roi,\n temporal=date_range,\n count=500\n)\n\n\nlen(results)"
+ "text": "2. Search for ECOSTRESS and EMIT Data\nBoth EMIT and ECOSTRESS products are hosted by the Land Processes Distributed Active Archive Center (LP DAAC). In this example we will use the cloud-hosted EMIT_L2A_RFL and ECOSTRESS_L2T_LSTE products available from the LP DAAC to find data. Any results we find for these products, should be available for other products within the EMIT and ECOSTRESS collections.\nTo find data we will use the earthaccess Python library. earthaccess searches NASA’s Common Metadata Repository (CMR), a metadata system that catalogs Earth Science data and associated metadata records. The results can then be used to download granules or generate lists granule search result URLs.\nUsing earthaccess we can search based on the attributes of a granule, which can be thought of as a spatiotemporal scene from an instrument containing multiple assets (eg. Reflectance, Reflectance Uncertainty, Masks for the EMIT L2A Reflectance Collection). We can search using attributes such as collection, acquisition time, and spatial footprint. This process can also be used with other EMIT or ECOSTRESS products, other collections, or different data providers, as well as across multiple catalogs with some modification.\n\n2.1 Define Spatial Region of Interest\nFor this example, our spatial region of interest (ROI) will be the Carpenteria Salt Marsh. You can learn more about it here: https://ucnrs.org/reserves/carpinteria-salt-marsh-reserve/. If you want to create a geojson polygon for your own ROI, you can do so using this website: https://geojson.io/#map=2/20/0, or you can convert a shapefile to a geojson using some code in the Appendices.\nIn this example, we elect to search using a polygon rather than a standard bounding box because bounding boxes will have a larger spatial extent, capturing a lot of area we may not be interested in. This becomes more important for searches with larger ROIs than our example here. To search for intersections with a polygon using earthaccess, we need to format our ROI as a counter-clockwise list of coordinate pairs.\nOpen the geojson file containing a landcover classification of Carpenteria Salt Marsh as a geodataframe, and check the coordinate reference system (CRS) of the data.\n\npolygon = gpd.read_file('../data/landcover.geojson')\npolygon.crs\n\n<Geographic 2D CRS: EPSG:4326>\nName: WGS 84\nAxis Info [ellipsoidal]:\n- Lat[north]: Geodetic latitude (degree)\n- Lon[east]: Geodetic longitude (degree)\nArea of Use:\n- name: World.\n- bounds: (-180.0, -90.0, 180.0, 90.0)\nDatum: World Geodetic System 1984 ensemble\n- Ellipsoid: WGS 84\n- Prime Meridian: Greenwich\n\n\nThe CRS is EPSG:4326 (WGS84), which is also the CRS we want the data in to submit for our search.\nNext, lets examine our polygon a bit closer.\n\npolygon\n\n\n\n\n\n\n\n\ntype\ngeometry\n\n\n\n\n0\nchannel\nMULTIPOLYGON (((-119.54125 34.40462, -119.5412...\n\n\n1\nsalt flat\nMULTIPOLYGON (((-119.52907 34.39633, -119.5290...\n\n\n2\nupland\nMULTIPOLYGON (((-119.54524 34.40555, -119.5452...\n\n\n3\npan\nMULTIPOLYGON (((-119.52924 34.39675, -119.5292...\n\n\n4\nmarsh\nMULTIPOLYGON (((-119.54162 34.40421, -119.5416...\n\n\n\n\n\n\n\nWe can see this geodataframe consists of multiple classes, each containing a multipolygon within our study site. We need to create an exterior boundary polygon containing these, and make sure the vertices are in counter-clockwise order to submit them in our query. To do this, create a polygon consisting of all the geometries, then calculate the convex hull of the union. This will give us a simple exterior polygon around our full ROI. After that, use the orient function to place our coordinate pairs in counter-clockwise order.\n\n# Merge all Polygon geometries and create external boundary\nroi_poly = polygon.unary_union.convex_hull\n# Re-order vertices to counter-clockwise\nroi_poly = orient(roi_poly, sign=1.0)\n\nWe can go ahead and visualize our region of interest and the original landcover polygon. First add a function to help reformat bound box coordinates to work with leaflet notation.\n\n# Function to convert a bounding box for use in leaflet notation\n\ndef convert_bounds(bbox, invert_y=False):\n \"\"\"\n Helper method for changing bounding box representation to leaflet notation\n\n ``(lon1, lat1, lon2, lat2) -> ((lat1, lon1), (lat2, lon2))``\n \"\"\"\n x1, y1, x2, y2 = bbox\n if invert_y:\n y1, y2 = y2, y1\n return ((y1, x1), (y2, x2))\n\nThen create a figure using folium.\n\nfig = Figure(width=\"800px\", height=\"400px\")\nmap1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')\nfig.add_child(map1)\n\n# Add Convex Hull Polygon\nfolium.GeoJson(roi_poly,\n name='convex hull',\n ).add_to(map1)\n\n# Add landcover classification geodataframe\npolygon.explore(\n \"type\",\n popup=True,\n categorical=True,\n cmap='Set3',\n style_kwds=dict(opacity=0.7, fillOpacity=0.4),\n name=\"Carpenteria Salt Marsh Landcover\",\n m=map1\n)\n\nmap1.add_child(folium.LayerControl())\nmap1.fit_bounds(bounds=convert_bounds(polygon.unary_union.bounds))\ndisplay(fig)\n\n\n\n\n\nAbove we can see our region of interest (ROI) and the landcover classification polygon that we opened. We can hover over different areas to see the land cover class.\nLastly we need to convert our polygon to a list of coordinate pairs.\n\n# Set ROI as list of exterior polygon vertices as coordinate pairs\nroi = list(roi_poly.exterior.coords)\n\n\n\n2.2 Define Collections of Interest\nWe need to specify which products we want to search for using their short-names. As mentioned above, we will conduct our search using the EMIT Level 2A Reflectance (EMITL2ARFL) and ECOSTRESS Level 2 Tiled Land Surface Temperature and Emmissivity (ECO_L2T_LSTE).\n\nNote: Here we use the Tiled ECOSTRESS LSTE Product. This will also work with the gridded LSTE and the swath; however, the swath product does not have a browse image for the visualization in section 4, and will require additional processing for subsequent analysis.\n\n\n# Data Collections for our search\ncollections = ['EMITL2ARFL', 'ECO_L2T_LSTE']\n\n\n\n2.3 Define Date Range\nFor our date range, we’ll look at data collected across the month of April 2023. The date_range can be specified as a pair of dates, start and end (up to, not including).\n\n# Define Date Range\ndate_range = ('2023-01-01','2023-09-01')\n\n\n\n2.4 Searching\nSubmit a query using earthaccess.\n\nresults = earthaccess.search_data(\n short_name=collections,\n polygon=roi,\n temporal=date_range,\n count=500\n)\n\nGranules found: 119\n\n\n\nlen(results)\n\n119"
},
{
"objectID": "python/01_Finding_Concurrent_Data.html#organizing-and-filtering-results",
"href": "python/01_Finding_Concurrent_Data.html#organizing-and-filtering-results",
"title": "01 Finding Concurrent ECOSTRESS and EMIT Data",
"section": "3. Organizing and Filtering Results",
- "text": "3. Organizing and Filtering Results\nAs we can see from above, the results object contains a list of objects with metadata and links. We can convert this to a more readable format, a dataframe. In addition, we can make it a geodataframe by taking the spatial metadata and creating a shapely polygon representing the spatial coverage, and further customize which information we want to use from other metadata fields.\nFirst, we define some functions to help us create a shapely object for our geodataframe, and retrieve the specific browse image URLs that we want. By default the browse image selected by earthaccess is the first one in the list, but the ECO_L2_LSTE has several browse images and we want to make sure we retrieve the png file, which is a preview of the LSTE.\n\n# Function to create shapely polygon of spatial coverage\ndef get_shapely_object(result:earthaccess.results.DataGranule):\n # Get Geometry Keys\n geo = result['umm']['SpatialExtent']['HorizontalSpatialDomain']['Geometry']\n keys = geo.keys()\n\n if 'BoundingRectangles' in keys:\n bounding_rectangle = geo['BoundingRectangles'][0]\n # Create bbox tuple\n bbox_coords = (bounding_rectangle['WestBoundingCoordinate'],bounding_rectangle['SouthBoundingCoordinate'],\n bounding_rectangle['EastBoundingCoordinate'],bounding_rectangle['NorthBoundingCoordinate'])\n # Create shapely geometry from bbox\n shape = geometry.box(*bbox_coords, ccw=True)\n elif 'GPolygons' in keys:\n points = geo['GPolygons'][0]['Boundary']['Points']\n # Create shapely geometry from polygons\n shape = geometry.Polygon([[p['Longitude'],p['Latitude']] for p in points])\n else:\n raise ValueError('Provided result does not contain bounding boxes/polygons or is incompatible.')\n return(shape)\n\n# Retrieve png browse image if it exists or first jpg in list of urls\ndef get_png(result:earthaccess.results.DataGranule):\n https_links = [link for link in result.dataviz_links() if 'https' in link]\n if len(https_links) == 1:\n browse = https_links[0]\n elif len(https_links) == 0:\n browse = 'no browse image'\n warnings.warn(f\"There is no browse imagery for {result['umm']['GranuleUR']}.\")\n else:\n browse = [png for png in https_links if '.png' in png][0]\n return(browse)\n\nNow that we have our functions we can create a dataframe, then calculate and add our shapely geometries to make a geodataframe. After that, add a column for our browse image urls and print the number of granules in our results, so we can monitor the quantity we are working with a we winnow down to the data we want.\n\n# Create Dataframe of Results Metadata\nresults_df = pd.json_normalize(results)\n# Create shapely polygons for result\ngeometries = [get_shapely_object(results[index]) for index in results_df.index.to_list()]\n# Convert to GeoDataframe\ngdf = gpd.GeoDataFrame(results_df, geometry=geometries, crs=\"EPSG:4326\")\n# Remove results df, no longer needed\ndel results_df\n# Add browse imagery links\ngdf['browse'] = [get_png(granule) for granule in results]\ngdf['shortname'] = [result['umm']['CollectionReference']['ShortName'] for result in results]\n# Preview GeoDataframe\nprint(f'{gdf.shape[0]} granules total')\n\nPreview our geodataframe to get an idea what it looks like.\n\ngdf.head()\n\nThere are a lot of columns with data that is not relevant to our goal, so we can drop those. To do that, list the names of colums.\n\n# List Column Names\ngdf.columns\n\nNow create a list of columns to keep and use it to filter the dataframe.\n\n# Create a list of columns to keep\nkeep_cols = ['meta.concept-id','meta.native-id', 'umm.TemporalExtent.RangeDateTime.BeginningDateTime','umm.TemporalExtent.RangeDateTime.EndingDateTime','umm.CloudCover','umm.DataGranule.DayNightFlag','geometry','browse', 'shortname']\n# Remove unneeded columns\ngdf = gdf[gdf.columns.intersection(keep_cols)]\ngdf.head()\n\nThis is looking better, but we can make it more readable by renaming our columns.\n\n# Rename some Columns\ngdf.rename(columns = {'meta.concept-id':'concept_id','meta.native-id':'granule',\n 'umm.TemporalExtent.RangeDateTime.BeginningDateTime':'start_datetime',\n 'umm.TemporalExtent.RangeDateTime.EndingDateTime':'end_datetime',\n 'umm.CloudCover':'cloud_cover',\n 'umm.DataGranule.DayNightFlag':'day_night'}, inplace=True)\ngdf.head()\n\n\nNote: If querying on-premises (not cloud) LP DAAC datasets, the meta.concept-id will not show as xxxxxx-LPCLOUD. For these datasets, the granule name can be retrieved from the umm.DataGranule.Identifiers column.\n\nWe can filter using the day/night flag as well, but this step will be unnecessary as we check to ensure all results from ECOSTRESS fall within an hour of resulting EMIT granules.\n\n# gdf = gdf[gdf['day_night'].str.contains('Day')]\n\nOur first step toward filtering the datasets will be to add a column with a datetime.\n\nYou may have noticed that the date format is similar for ECOSTRESS and EMIT, but the ECOSTRESS data has an additional fractional seconds. If using the recommended lpdaac_vitals Windows environment, you will need to pass the format='ISO8601'argument to the to_datetime function, like in the commented out line.\n\n\ngdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'])\n# gdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'], format='ISO8601')\n\nWe can roughly visualize the quantity of results by month at our location using a histogram with 8 bins (Jan up to Sept).\n\ngdf.hist(column='datetime_obj', by='shortname', bins=9, color='green', edgecolor='black', linewidth=1, sharey=True)\n\nNow we will separate the results into two dataframes, one for ECOTRESS and one for EMIT and print the number of results for each so we can monitor how many granules we’re filtering.\n\n# Suppress Setting with Copy Warning - not applicable in this use case\npd.options.mode.chained_assignment = None # default='warn'\n\n# Split into two dataframes - ECO and EMIT\neco_gdf = gdf[gdf['granule'].str.contains('ECO')]\nemit_gdf = gdf[gdf['granule'].str.contains('EMIT')]\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\n\nemit_gdf.head()\n\nWe still haven’t filtered the locations where EMIT and ECOSTRESS have data at the same spatial location and time-frame. The EMIT acquisition mask has been added to ECOSTRESS, so in most cases if EMIT is collecting data, so will ECOSTRESS, but there are edge cases where this is not true. To do this we’ll use two filters to catch the edge-cases, and provide an example that can be used with other datasets.\nFirst, since EMIT has a smaller swath width, we can can use a unary union of the spatial coverage present in our geodataframe to filter out ecostress granules that do not overlap with it.\n\n# Subset ECOSTRESS Granules in Geodataframe by intersection with EMIT granules\n## Create new column based on intersection with union of EMIT polygons.\neco_gdf['intersects'] = eco_gdf.intersects(emit_gdf.unary_union)\n## Apply subsetting\neco_gdf = eco_gdf[eco_gdf['intersects'] == True]\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\nIn this instance, our results aren’t narrowed because our region of interest is smaller than a single EMIT scene. If the spatial ROI was very large, this would be much more unlikely.\nAdditionally, we want to make sure that data in our results are collected at the same time. For EMIT and ECOSTRESS, the EMIT acquisition mask has been added to the ECOSTRESS mask, meaning that if there is an EMIT scene, there should also be an ECOSTRESS scene acquired at the same time. In practice, however, the timestamps on the scenes can vary slightly. In order to capture this slight variability, we need to use a range instead of a single timestamp to capture concurrent data. To do this, we’ll ensure all ECOSTRESS granule start times fall within 10 minutes of any of the EMIT granules in our results, and vice-versa.\nWrite a function to evaluate whether these datetime objects fall within 10 minutes of one another using the timedelta function.\n\n# Function to Filter timestamps that do not fall within a time_delta of timestamps from the other acquisition time\ndef concurrent_match(gdf_a:pd.DataFrame, gdf_b:pd.DataFrame, col_name:str, time_delta:timedelta):\n \"\"\"\n Cross references dataframes containing a datetime object column and keeps rows in \n each that fall within the provided timedelta of the other. Acceptable time_delta examples:\n \n months=1\n days=1\n hours=1\n minutes=1\n seconds=1 \n\n \"\"\"\n # Match Timestamps from Dataframe A with Time-range of entries in Dataframe B\n # Create empty list\n a_list = []\n # Iterate results for product a based on index values\n for _n in gdf_b.index.to_list():\n # Find where product b is within the window of each product a result\n a_matches = (gdf_a[col_name] > gdf_b[col_name][_n]-time_delta) & (gdf_a[col_name] < gdf_b[col_name][_n]+time_delta)\n # Append list with values\n a_list.append(a_matches)\n # Match Timestamps from Dataframe B with Time-range of entries in Dataframe A\n # Create empy list\n b_list =[]\n for _m in gdf_a.index.to_list():\n # Find where product a is within the window of each product b result\n b_matches = (gdf_b[col_name] > gdf_a[col_name][_m]-time_delta) & (gdf_b[col_name] < gdf_a[col_name][_m]+time_delta)\n # Append list with values\n b_list.append(b_matches)\n # Filter Original Dataframes by summing list of bools, 0 = outside of all time-ranges\n a_filtered = gdf_a.loc[sum(a_list) > 0]\n b_filtered = gdf_b.loc[sum(b_list) > 0]\n return(a_filtered, b_filtered)\n\nNow run our function.\n\n\neco_gdf, emit_gdf = concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(minutes=10))\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')"
+ "text": "3. Organizing and Filtering Results\nAs we can see from above, the results object contains a list of objects with metadata and links. We can convert this to a more readable format, a dataframe. In addition, we can make it a geodataframe by taking the spatial metadata and creating a shapely polygon representing the spatial coverage, and further customize which information we want to use from other metadata fields.\nFirst, we define some functions to help us create a shapely object for our geodataframe, and retrieve the specific browse image URLs that we want. By default the browse image selected by earthaccess is the first one in the list, but the ECO_L2_LSTE has several browse images and we want to make sure we retrieve the png file, which is a preview of the LSTE.\n\n# Function to create shapely polygon of spatial coverage\ndef get_shapely_object(result:earthaccess.results.DataGranule):\n # Get Geometry Keys\n geo = result['umm']['SpatialExtent']['HorizontalSpatialDomain']['Geometry']\n keys = geo.keys()\n\n if 'BoundingRectangles' in keys:\n bounding_rectangle = geo['BoundingRectangles'][0]\n # Create bbox tuple\n bbox_coords = (bounding_rectangle['WestBoundingCoordinate'],bounding_rectangle['SouthBoundingCoordinate'],\n bounding_rectangle['EastBoundingCoordinate'],bounding_rectangle['NorthBoundingCoordinate'])\n # Create shapely geometry from bbox\n shape = geometry.box(*bbox_coords, ccw=True)\n elif 'GPolygons' in keys:\n points = geo['GPolygons'][0]['Boundary']['Points']\n # Create shapely geometry from polygons\n shape = geometry.Polygon([[p['Longitude'],p['Latitude']] for p in points])\n else:\n raise ValueError('Provided result does not contain bounding boxes/polygons or is incompatible.')\n return(shape)\n\n# Retrieve png browse image if it exists or first jpg in list of urls\ndef get_png(result:earthaccess.results.DataGranule):\n https_links = [link for link in result.dataviz_links() if 'https' in link]\n if len(https_links) == 1:\n browse = https_links[0]\n elif len(https_links) == 0:\n browse = 'no browse image'\n warnings.warn(f\"There is no browse imagery for {result['umm']['GranuleUR']}.\")\n else:\n browse = [png for png in https_links if '.png' in png][0]\n return(browse)\n\nNow that we have our functions we can create a dataframe, then calculate and add our shapely geometries to make a geodataframe. After that, add a column for our browse image urls and print the number of granules in our results, so we can monitor the quantity we are working with a we winnow down to the data we want.\n\n# Create Dataframe of Results Metadata\nresults_df = pd.json_normalize(results)\n# Create shapely polygons for result\ngeometries = [get_shapely_object(results[index]) for index in results_df.index.to_list()]\n# Convert to GeoDataframe\ngdf = gpd.GeoDataFrame(results_df, geometry=geometries, crs=\"EPSG:4326\")\n# Remove results df, no longer needed\ndel results_df\n# Add browse imagery links\ngdf['browse'] = [get_png(granule) for granule in results]\ngdf['shortname'] = [result['umm']['CollectionReference']['ShortName'] for result in results]\n# Preview GeoDataframe\nprint(f'{gdf.shape[0]} granules total')\n\n119 granules total\n\n\nPreview our geodataframe to get an idea what it looks like.\n\ngdf.head()\n\n\n\n\n\n\n\n\nsize\nmeta.concept-type\nmeta.concept-id\nmeta.revision-id\nmeta.native-id\nmeta.provider-id\nmeta.format\nmeta.revision-date\numm.TemporalExtent.RangeDateTime.BeginningDateTime\numm.TemporalExtent.RangeDateTime.EndingDateTime\n...\numm.Platforms\numm.MetadataSpecification.URL\numm.MetadataSpecification.Name\numm.MetadataSpecification.Version\numm.SpatialExtent.HorizontalSpatialDomain.Geometry.GPolygons\numm.PGEVersionClass.PGEName\numm.CloudCover\ngeometry\nbrowse\nshortname\n\n\n\n\n0\n2.49626\ngranule\nG2581836170-LPCLOUD\n1\nECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...\nLPCLOUD\napplication/echo10+xml\n2023-01-07T13:31:07.584Z\n2023-01-01T15:52:48.650Z\n2023-01-01T15:53:40.620Z\n...\n[{'ShortName': 'ISS', 'Instruments': [{'ShortN...\nhttps://cdn.earthdata.nasa.gov/umm/granule/v1.6.5\nUMM-G\n1.6.5\nNaN\nNaN\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n1\n8.49543\ngranule\nG2586136993-LPCLOUD\n1\nECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...\nLPCLOUD\napplication/echo10+xml\n2023-01-10T22:21:57.631Z\n2023-01-03T07:44:22.480Z\n2023-01-03T07:45:14.450Z\n...\n[{'ShortName': 'ISS', 'Instruments': [{'ShortN...\nhttps://cdn.earthdata.nasa.gov/umm/granule/v1.6.5\nUMM-G\n1.6.5\nNaN\nNaN\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n2\n1.16068\ngranule\nG2591892077-LPCLOUD\n1\nECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...\nLPCLOUD\napplication/echo10+xml\n2023-01-18T19:13:09.017Z\n2023-01-07T06:07:21.560Z\n2023-01-07T06:08:13.530Z\n...\n[{'ShortName': 'ISS', 'Instruments': [{'ShortN...\nhttps://cdn.earthdata.nasa.gov/umm/granule/v1.6.5\nUMM-G\n1.6.5\nNaN\nNaN\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n3\n3.42404\ngranule\nG2592560828-LPCLOUD\n1\nECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...\nLPCLOUD\napplication/echo10+xml\n2023-01-19T11:23:08.273Z\n2023-01-11T04:30:26.510Z\n2023-01-11T04:31:18.480Z\n...\n[{'ShortName': 'ISS', 'Instruments': [{'ShortN...\nhttps://cdn.earthdata.nasa.gov/umm/granule/v1.6.5\nUMM-G\n1.6.5\nNaN\nNaN\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n4\n14.28010\ngranule\nG2593132604-LPCLOUD\n1\nECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...\nLPCLOUD\napplication/echo10+xml\n2023-01-20T05:37:00.651Z\n2023-01-12T11:50:28.830Z\n2023-01-12T11:51:20.800Z\n...\n[{'ShortName': 'ISS', 'Instruments': [{'ShortN...\nhttps://cdn.earthdata.nasa.gov/umm/granule/v1.6.5\nUMM-G\n1.6.5\nNaN\nNaN\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n\n\n5 rows × 34 columns\n\n\n\nThere are a lot of columns with data that is not relevant to our goal, so we can drop those. To do that, list the names of colums.\n\n# List Column Names\ngdf.columns\n\nIndex(['size', 'meta.concept-type', 'meta.concept-id', 'meta.revision-id',\n 'meta.native-id', 'meta.provider-id', 'meta.format',\n 'meta.revision-date',\n 'umm.TemporalExtent.RangeDateTime.BeginningDateTime',\n 'umm.TemporalExtent.RangeDateTime.EndingDateTime',\n 'umm.OrbitCalculatedSpatialDomains', 'umm.GranuleUR',\n 'umm.AdditionalAttributes', 'umm.MeasuredParameters',\n 'umm.SpatialExtent.HorizontalSpatialDomain.Geometry.BoundingRectangles',\n 'umm.ProviderDates', 'umm.CollectionReference.ShortName',\n 'umm.CollectionReference.Version', 'umm.PGEVersionClass.PGEVersion',\n 'umm.RelatedUrls', 'umm.DataGranule.DayNightFlag',\n 'umm.DataGranule.Identifiers', 'umm.DataGranule.ProductionDateTime',\n 'umm.DataGranule.ArchiveAndDistributionInformation', 'umm.Platforms',\n 'umm.MetadataSpecification.URL', 'umm.MetadataSpecification.Name',\n 'umm.MetadataSpecification.Version',\n 'umm.SpatialExtent.HorizontalSpatialDomain.Geometry.GPolygons',\n 'umm.PGEVersionClass.PGEName', 'umm.CloudCover', 'geometry', 'browse',\n 'shortname'],\n dtype='object')\n\n\nNow create a list of columns to keep and use it to filter the dataframe.\n\n# Create a list of columns to keep\nkeep_cols = ['meta.concept-id','meta.native-id', 'umm.TemporalExtent.RangeDateTime.BeginningDateTime','umm.TemporalExtent.RangeDateTime.EndingDateTime','umm.CloudCover','umm.DataGranule.DayNightFlag','geometry','browse', 'shortname']\n# Remove unneeded columns\ngdf = gdf[gdf.columns.intersection(keep_cols)]\ngdf.head()\n\n\n\n\n\n\n\n\nmeta.concept-id\nmeta.native-id\numm.TemporalExtent.RangeDateTime.BeginningDateTime\numm.TemporalExtent.RangeDateTime.EndingDateTime\numm.DataGranule.DayNightFlag\numm.CloudCover\ngeometry\nbrowse\nshortname\n\n\n\n\n0\nG2581836170-LPCLOUD\nECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...\n2023-01-01T15:52:48.650Z\n2023-01-01T15:53:40.620Z\nDay\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n1\nG2586136993-LPCLOUD\nECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...\n2023-01-03T07:44:22.480Z\n2023-01-03T07:45:14.450Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n2\nG2591892077-LPCLOUD\nECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...\n2023-01-07T06:07:21.560Z\n2023-01-07T06:08:13.530Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n3\nG2592560828-LPCLOUD\nECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...\n2023-01-11T04:30:26.510Z\n2023-01-11T04:31:18.480Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n4\nG2593132604-LPCLOUD\nECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...\n2023-01-12T11:50:28.830Z\n2023-01-12T11:51:20.800Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n\n\n\n\n\nThis is looking better, but we can make it more readable by renaming our columns.\n\n# Rename some Columns\ngdf.rename(columns = {'meta.concept-id':'concept_id','meta.native-id':'granule',\n 'umm.TemporalExtent.RangeDateTime.BeginningDateTime':'start_datetime',\n 'umm.TemporalExtent.RangeDateTime.EndingDateTime':'end_datetime',\n 'umm.CloudCover':'cloud_cover',\n 'umm.DataGranule.DayNightFlag':'day_night'}, inplace=True)\ngdf.head()\n\n\n\n\n\n\n\n\nconcept_id\ngranule\nstart_datetime\nend_datetime\nday_night\ncloud_cover\ngeometry\nbrowse\nshortname\n\n\n\n\n0\nG2581836170-LPCLOUD\nECOv002_L2T_LSTE_25460_016_11SKU_20230101T1552...\n2023-01-01T15:52:48.650Z\n2023-01-01T15:53:40.620Z\nDay\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n1\nG2586136993-LPCLOUD\nECOv002_L2T_LSTE_25486_006_11SKU_20230103T0744...\n2023-01-03T07:44:22.480Z\n2023-01-03T07:45:14.450Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n2\nG2591892077-LPCLOUD\nECOv002_L2T_LSTE_25547_005_11SKU_20230107T0607...\n2023-01-07T06:07:21.560Z\n2023-01-07T06:08:13.530Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n3\nG2592560828-LPCLOUD\nECOv002_L2T_LSTE_25608_003_11SKU_20230111T0430...\n2023-01-11T04:30:26.510Z\n2023-01-11T04:31:18.480Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n4\nG2593132604-LPCLOUD\nECOv002_L2T_LSTE_25628_015_11SKU_20230112T1150...\n2023-01-12T11:50:28.830Z\n2023-01-12T11:51:20.800Z\nNight\nNaN\nPOLYGON ((-119.06582 34.21003, -119.06582 35.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nECO_L2T_LSTE\n\n\n\n\n\n\n\n\nNote: If querying on-premises (not cloud) LP DAAC datasets, the meta.concept-id will not show as xxxxxx-LPCLOUD. For these datasets, the granule name can be retrieved from the umm.DataGranule.Identifiers column.\n\nWe can filter using the day/night flag as well, but this step will be unnecessary as we check to ensure all results from ECOSTRESS fall within an hour of resulting EMIT granules.\n\n# gdf = gdf[gdf['day_night'].str.contains('Day')]\n\nOur first step toward filtering the datasets will be to add a column with a datetime.\n\nYou may have noticed that the date format is similar for ECOSTRESS and EMIT, but the ECOSTRESS data has an additional fractional seconds. If using the recommended lpdaac_vitals Windows environment, you will need to pass the format='ISO8601'argument to the to_datetime function, like in the commented out line.\n\n\n# gdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'])\ngdf['datetime_obj'] = pd.to_datetime(gdf['start_datetime'], format='ISO8601')\n\nWe can roughly visualize the quantity of results by month at our location using a histogram with 8 bins (Jan up to Sept).\n\ngdf.hist(column='datetime_obj', by='shortname', bins=9, color='green', edgecolor='black', linewidth=1, sharey=True)\n\narray([<Axes: title={'center': 'ECO_L2T_LSTE'}>,\n <Axes: title={'center': 'EMITL2ARFL'}>], dtype=object)\n\n\n\n\n\nNow we will separate the results into two dataframes, one for ECOTRESS and one for EMIT and print the number of results for each so we can monitor how many granules we’re filtering.\n\n# Suppress Setting with Copy Warning - not applicable in this use case\npd.options.mode.chained_assignment = None # default='warn'\n\n# Split into two dataframes - ECO and EMIT\neco_gdf = gdf[gdf['granule'].str.contains('ECO')]\nemit_gdf = gdf[gdf['granule'].str.contains('EMIT')]\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\n ECOSTRESS Granules: 111 \n EMIT Granules: 8\n\n\n\nemit_gdf.head()\n\n\n\n\n\n\n\n\nconcept_id\ngranule\nstart_datetime\nend_datetime\nday_night\ncloud_cover\ngeometry\nbrowse\nshortname\ndatetime_obj\n\n\n\n\n20\nG2631045418-LPCLOUD\nEMIT_L2A_RFL_001_20230219T202951_2305013_003\n2023-02-19T20:29:51Z\n2023-02-19T20:30:14Z\nDay\n67.0\nPOLYGON ((-120.04838 35.03646, -120.54523 34.4...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nEMITL2ARFL\n2023-02-19 20:29:51+00:00\n\n\n23\nG2631458885-LPCLOUD\nEMIT_L2A_RFL_001_20230223T185548_2305412_004\n2023-02-23T18:55:48Z\n2023-02-23T18:56:00Z\nDay\n96.0\nPOLYGON ((-119.95147 35.08163, -120.44685 34.4...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nEMITL2ARFL\n2023-02-23 18:55:48+00:00\n\n\n27\nG2631818047-LPCLOUD\nEMIT_L2A_RFL_001_20230227T172140_2305811_006\n2023-02-27T17:21:40Z\n2023-02-27T17:21:52Z\nDay\n100.0\nPOLYGON ((-119.41012 34.81982, -119.90931 34.1...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nEMITL2ARFL\n2023-02-27 17:21:40+00:00\n\n\n47\nG2667368816-LPCLOUD\nEMIT_L2A_RFL_001_20230405T190311_2309513_002\n2023-04-05T19:03:11Z\n2023-04-05T19:03:23Z\nDay\n71.0\nPOLYGON ((-119.87804 34.90691, -120.65480 34.2...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nEMITL2ARFL\n2023-04-05 19:03:11+00:00\n\n\n48\nG2667369196-LPCLOUD\nEMIT_L2A_RFL_001_20230405T190323_2309513_003\n2023-04-05T19:03:23Z\n2023-04-05T19:03:35Z\nDay\n8.0\nPOLYGON ((-119.22086 35.41499, -120.01088 34.8...\nhttps://data.lpdaac.earthdatacloud.nasa.gov/lp...\nEMITL2ARFL\n2023-04-05 19:03:23+00:00\n\n\n\n\n\n\n\nWe still haven’t filtered the locations where EMIT and ECOSTRESS have data at the same spatial location and time-frame. The EMIT acquisition mask has been added to ECOSTRESS, so in most cases if EMIT is collecting data, so will ECOSTRESS, but there are edge cases where this is not true. To do this we’ll use two filters to catch the edge-cases, and provide an example that can be used with other datasets.\nFirst, since EMIT has a smaller swath width, we can can use a unary union of the spatial coverage present in our geodataframe to filter out ecostress granules that do not overlap with it.\n\n# Subset ECOSTRESS Granules in Geodataframe by intersection with EMIT granules\n## Create new column based on intersection with union of EMIT polygons.\neco_gdf['intersects'] = eco_gdf.intersects(emit_gdf.unary_union)\n## Apply subsetting\neco_gdf = eco_gdf[eco_gdf['intersects'] == True]\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\n ECOSTRESS Granules: 111 \n EMIT Granules: 8\n\n\nIn this instance, our results aren’t narrowed because our region of interest is smaller than a single EMIT scene. If the spatial ROI was very large, this would be much more unlikely.\nAdditionally, we want to make sure that data in our results are collected at the same time. For EMIT and ECOSTRESS, the EMIT acquisition mask has been added to the ECOSTRESS mask, meaning that if there is an EMIT scene, there should also be an ECOSTRESS scene acquired at the same time. In practice, however, the timestamps on the scenes can vary slightly. In order to capture this slight variability, we need to use a range instead of a single timestamp to capture concurrent data. To do this, we’ll ensure all ECOSTRESS granule start times fall within 10 minutes of any of the EMIT granules in our results, and vice-versa.\nWrite a function to evaluate whether these datetime objects fall within 10 minutes of one another using the timedelta function.\n\n# Function to Filter timestamps that do not fall within a time_delta of timestamps from the other acquisition time\ndef concurrent_match(gdf_a:pd.DataFrame, gdf_b:pd.DataFrame, col_name:str, time_delta:timedelta):\n \"\"\"\n Cross references dataframes containing a datetime object column and keeps rows in \n each that fall within the provided timedelta of the other. Acceptable time_delta examples:\n \n months=1\n days=1\n hours=1\n minutes=1\n seconds=1 \n\n \"\"\"\n # Match Timestamps from Dataframe A with Time-range of entries in Dataframe B\n # Create empty list\n a_list = []\n # Iterate results for product a based on index values\n for _n in gdf_b.index.to_list():\n # Find where product b is within the window of each product a result\n a_matches = (gdf_a[col_name] > gdf_b[col_name][_n]-time_delta) & (gdf_a[col_name] < gdf_b[col_name][_n]+time_delta)\n # Append list with values\n a_list.append(a_matches)\n # Match Timestamps from Dataframe B with Time-range of entries in Dataframe A\n # Create empy list\n b_list =[]\n for _m in gdf_a.index.to_list():\n # Find where product a is within the window of each product b result\n b_matches = (gdf_b[col_name] > gdf_a[col_name][_m]-time_delta) & (gdf_b[col_name] < gdf_a[col_name][_m]+time_delta)\n # Append list with values\n b_list.append(b_matches)\n # Filter Original Dataframes by summing list of bools, 0 = outside of all time-ranges\n a_filtered = gdf_a.loc[sum(a_list) > 0]\n b_filtered = gdf_b.loc[sum(b_list) > 0]\n return(a_filtered, b_filtered)\n\nNow run our function.\n\n\neco_gdf, emit_gdf = concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(minutes=10))\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\n ECOSTRESS Granules: 5 \n EMIT Granules: 6"
},
{
"objectID": "python/01_Finding_Concurrent_Data.html#visualizing-intersecting-coverage",
"href": "python/01_Finding_Concurrent_Data.html#visualizing-intersecting-coverage",
"title": "01 Finding Concurrent ECOSTRESS and EMIT Data",
"section": "4. Visualizing Intersecting Coverage",
- "text": "4. Visualizing Intersecting Coverage\nNow that we have geodataframes containing some concurrent data, we can visualize them on a map using folium. It’s often difficult to visualize a large time-series of scenes, so we’ve included an example in Appendix A1 on how to filter to a single day.\n\n# Plot Using Folium\n\n# Create Figure and Select Background Tiles\nfig = Figure(width=\"1100px\", height=\"550px\")\nmap1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')\nfig.add_child(map1)\n\n# Plot STAC ECOSTRESS Results - note we must drop the datetime_obj columns for this to work\neco_gdf.drop(columns=['datetime_obj']).explore(\n \"granule\",\n categorical=True,\n tooltip=[\n \"granule\",\n \"start_datetime\",\n \"cloud_cover\",\n ],\n popup=True,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"ECOSTRESS\",\n m=map1,\n)\n\n# Plot STAC EMITL2ARFL Results - note we must drop the datetime_obj columns for this to work\nemit_gdf.drop(columns=['datetime_obj']).explore(\n \"granule\",\n categorical=True,\n tooltip=[\n \"granule\",\n \"start_datetime\",\n \"cloud_cover\",\n ],\n popup=True,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"EMIT\",\n m=map1,\n)\n\n# ECOSTRESS Browse Images - Comment out to remove\nfor _n in eco_gdf.index.to_list():\n folium.raster_layers.ImageOverlay(\n image=eco_gdf['browse'][_n],\n name=eco_gdf['granule'][_n],\n bounds=[[eco_gdf.bounds['miny'][_n], eco_gdf.bounds['minx'][_n]], [eco_gdf.bounds['maxy'][_n], eco_gdf.bounds['maxx'][_n]]],\n interactive=False,\n cross_origin=False,\n opacity=0.75,\n zindex=1,\n ).add_to(map1)\n\n# Plot Region of Interest\npolygon.explore(\n popup=False,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"Region of Interest\",\n m=map1\n)\n\nmap1.fit_bounds(bounds=convert_bounds(gdf.unary_union.bounds))\nmap1.add_child(folium.LayerControl())\ndisplay(fig)\n\nIn the figure above, you can zoom in and out, click and drag to reposition the legend, and add or remove layers using the layer control in the top right. Notice that since we’re using the tiled ecostress product, we have 2 overlapping tiles at our ROI. You can visualize the tiles by adding or removing the layers.\n\n4.2 Previewing EMIT Browse Imagery\nThe EMIT browse imagery is not orthorectified, so it can’t be visualized on a plot like the ECOSTRESS browse imagery. To get an idea what scenes look like we can plot them in a grid using matplotlib.\n\nNote: The black space is indicative of onboard cloud masking that occurs before data is downlinked from the ISS.\n\n\ncols = 3\nrows = math.ceil(len(emit_gdf)/cols)\nfig, ax = plt.subplots(rows, cols, figsize=(20,20))\nax = ax.flatten()\n\nfor _n, index in enumerate(emit_gdf.index.to_list()):\n img = io.imread(emit_gdf['browse'][index])\n ax[_n].imshow(img)\n ax[_n].set_title(f\"Index: {index} - {emit_gdf['granule'][index]}\")\n ax[_n].axis('off')\nplt.tight_layout()\nplt.show\n\n\n\n4.3 Further Filtering\nWe can see that some of these granules likely won’t work because of the large amount of cloud cover, we can use a list of these to filter them out. Make a list of indexes to filter out.\n\n# Bad granule list\nbad_granules = [27,74,87]\n\nFilter out the bad granules.\n\nemit_gdf = emit_gdf[~emit_gdf.index.isin(bad_granules)]\n\nNow that we’ve narrowed our EMIT results we can again filter the ecostress granules based on their concurrency with our filtered EMIT granules.\n\neco_gdf, emit_gdf = concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(hours=1))\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')"
+ "text": "4. Visualizing Intersecting Coverage\nNow that we have geodataframes containing some concurrent data, we can visualize them on a map using folium. It’s often difficult to visualize a large time-series of scenes, so we’ve included an example in Appendix A1 on how to filter to a single day.\n\n# Plot Using Folium\n\n# Create Figure and Select Background Tiles\nfig = Figure(width=\"1100px\", height=\"550px\")\nmap1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')\nfig.add_child(map1)\n\n# Plot STAC ECOSTRESS Results - note we must drop the datetime_obj columns for this to work\neco_gdf.drop(columns=['datetime_obj']).explore(\n \"granule\",\n categorical=True,\n tooltip=[\n \"granule\",\n \"start_datetime\",\n \"cloud_cover\",\n ],\n popup=True,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"ECOSTRESS\",\n m=map1,\n)\n\n# Plot STAC EMITL2ARFL Results - note we must drop the datetime_obj columns for this to work\nemit_gdf.drop(columns=['datetime_obj']).explore(\n \"granule\",\n categorical=True,\n tooltip=[\n \"granule\",\n \"start_datetime\",\n \"cloud_cover\",\n ],\n popup=True,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"EMIT\",\n m=map1,\n)\n\n# ECOSTRESS Browse Images - Comment out to remove\nfor _n in eco_gdf.index.to_list():\n folium.raster_layers.ImageOverlay(\n image=eco_gdf['browse'][_n],\n name=eco_gdf['granule'][_n],\n bounds=[[eco_gdf.bounds['miny'][_n], eco_gdf.bounds['minx'][_n]], [eco_gdf.bounds['maxy'][_n], eco_gdf.bounds['maxx'][_n]]],\n interactive=False,\n cross_origin=False,\n opacity=0.75,\n zindex=1,\n ).add_to(map1)\n\n# Plot Region of Interest\npolygon.explore(\n popup=False,\n style_kwds=dict(fillOpacity=0.1, width=2),\n name=\"Region of Interest\",\n m=map1\n)\n\nmap1.fit_bounds(bounds=convert_bounds(gdf.unary_union.bounds))\nmap1.add_child(folium.LayerControl())\ndisplay(fig)\n\n\n\n\n\nIn the figure above, you can zoom in and out, click and drag to reposition the legend, and add or remove layers using the layer control in the top right. Notice that since we’re using the tiled ecostress product, we have 2 overlapping tiles at our ROI. You can visualize the tiles by adding or removing the layers.\n\n4.2 Previewing EMIT Browse Imagery\nThe EMIT browse imagery is not orthorectified, so it can’t be visualized on a plot like the ECOSTRESS browse imagery. To get an idea what scenes look like we can plot them in a grid using matplotlib.\n\nNote: The black space is indicative of onboard cloud masking that occurs before data is downlinked from the ISS.\n\n\ncols = 3\nrows = math.ceil(len(emit_gdf)/cols)\nfig, ax = plt.subplots(rows, cols, figsize=(20,20))\nax = ax.flatten()\n\nfor _n, index in enumerate(emit_gdf.index.to_list()):\n img = io.imread(emit_gdf['browse'][index])\n ax[_n].imshow(img)\n ax[_n].set_title(f\"Index: {index} - {emit_gdf['granule'][index]}\")\n ax[_n].axis('off')\nplt.tight_layout()\nplt.show\n\n<function matplotlib.pyplot.show(close=None, block=None)>\n\n\n\n\n\n\n\n4.3 Further Filtering\nWe can see that some of these granules likely won’t work because of the large amount of cloud cover, we can use a list of these to filter them out. Make a list of indexes to filter out.\n\n# Bad granule list\nbad_granules = [27,74,87]\n\nFilter out the bad granules.\n\nemit_gdf = emit_gdf[~emit_gdf.index.isin(bad_granules)]\n\nNow that we’ve narrowed our EMIT results we can again filter the ecostress granules based on their concurrency with our filtered EMIT granules.\n\neco_gdf, emit_gdf = concurrent_match(eco_gdf,emit_gdf, col_name='datetime_obj',time_delta=timedelta(hours=1))\nprint(f' ECOSTRESS Granules: {eco_gdf.shape[0]} \\n EMIT Granules: {emit_gdf.shape[0]}')\n\n ECOSTRESS Granules: 2 \n EMIT Granules: 3"
},
{
"objectID": "python/01_Finding_Concurrent_Data.html#generating-a-list-of-urls-and-downloading-data",
"href": "python/01_Finding_Concurrent_Data.html#generating-a-list-of-urls-and-downloading-data",
"title": "01 Finding Concurrent ECOSTRESS and EMIT Data",
"section": "5. Generating a list of URLs and downloading data",
- "text": "5. Generating a list of URLs and downloading data\nCreating a list of results URLs will include all of these assets, so if we only want a subset we need an additional filter to keep the specific assets we want.\nIf you look back, you can see we kept the same indexing throughout the notebook. This enables us to simply subset the earthaccess results object to retrieve the results we want.\nCreate a list of index values to keep.\n\nkeep_granules = eco_gdf.index.to_list()+emit_gdf.index.to_list()\nkeep_granules.sort()\n\nFilter the results list.\n\nfiltered_results = [result for i, result in enumerate(results) if i in keep_granules]\n\nNow we can download all of the associated assets, or retrieve the URLS and further filter them to specifically what we want.\nFirst, log into Earthdata using the login function from the earthaccess library. The persist=True argument will create a local .netrc file if it doesn’t exist, or add your login info to an existing .netrc file. If no Earthdata Login credentials are found in the .netrc you’ll be prompted for them. As mentioned in section 1.2, this step is not necessary to conduct searches, but is needed to download or stream data.\n\nearthaccess.login(persist=True)\n\nNow we can download all assets using the following cell.\n\n# # Download All Assets for Granules in Filtered Results\n# earthaccess.download(filtered_results, '../data/')\n\nOr we can create a list of URLs and use that to further refine which files we download.\n\n# Retrieve URLS for Assets\nresults_urls = [granule.data_links() for granule in filtered_results]\n\nGranules often have several assets associated with them, for example, ECO_L2T_LSTE has several assets: - Water Mask (water) - Cloud Mask (cloud) - Quality (QC) - Land Surface Temperature (LST) - Land Surface Temperature Error (LST_err) - Wide Band Emissivity (EmisWB) - Height (height)\nThe results list we just generated contains URLs to all of these files. We can further filter our results list using string matching to remove unwanted assets.\nCreate a list of strings and enumerate through our results_url list to filter out unwanted assets.\n\nfiltered_asset_links = []\n# Pick Desired Assets (leave _ on RFL to distinguish from RFLUNC, LST. to distinguish from LST_err)\ndesired_assets = ['RFL_', 'LST.'] # Add more or do individually for reflectance, reflectance uncertainty, or mask\n# Step through each sublist (granule) and filter based on desired assets.\nfor n, granule in enumerate(results_urls):\n for url in granule: \n asset_name = url.split('/')[-1]\n if any(asset in asset_name for asset in desired_assets):\n filtered_asset_links.append(url)\nfiltered_asset_links\n\nUncomment the cell below (select all, then ctrl+/) and download the data that we’ve filtered.\n\n# # Get requests https Session using Earthdata Login Info\n# fs = earthaccess.get_requests_https_session()\n# # Retrieve granule asset ID from URL (to maintain existing naming convention)\n# for url in filtered_asset_links:\n# granule_asset_id = url.split('/')[-1]\n# # Define Local Filepath\n# fp = f'../data/{granule_asset_id}'\n# # Download the Granule Asset if it doesn't exist\n# if not os.path.isfile(fp):\n# with fs.get(url,stream=True) as src:\n# with open(fp,'wb') as dst:\n# for chunk in src.iter_content(chunk_size=64*1024*1024):\n# dst.write(chunk)\n\nCongratulations, now you you have downloaded concurrent data from the ECOSTRESS and EMIT instruments on the ISS."
+ "text": "5. Generating a list of URLs and downloading data\nCreating a list of results URLs will include all of these assets, so if we only want a subset we need an additional filter to keep the specific assets we want.\nIf you look back, you can see we kept the same indexing throughout the notebook. This enables us to simply subset the earthaccess results object to retrieve the results we want.\nCreate a list of index values to keep.\n\nkeep_granules = eco_gdf.index.to_list()+emit_gdf.index.to_list()\nkeep_granules.sort()\n\nFilter the results list.\n\nfiltered_results = [result for i, result in enumerate(results) if i in keep_granules]\n\nNow we can download all of the associated assets, or retrieve the URLS and further filter them to specifically what we want.\nFirst, log into Earthdata using the login function from the earthaccess library. The persist=True argument will create a local .netrc file if it doesn’t exist, or add your login info to an existing .netrc file. If no Earthdata Login credentials are found in the .netrc you’ll be prompted for them. As mentioned in section 1.2, this step is not necessary to conduct searches, but is needed to download or stream data.\n\nearthaccess.login(persist=True)\n\nEARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)\nYou're now authenticated with NASA Earthdata Login\nUsing token with expiration date: 12/24/2023\nUsing .netrc file for EDL\n\n\n<earthaccess.auth.Auth at 0x248568a2740>\n\n\nNow we can download all assets using the following cell.\n\n# # Download All Assets for Granules in Filtered Results\n# earthaccess.download(filtered_results, '../data/')\n\nOr we can create a list of URLs and use that to further refine which files we download.\n\n# Retrieve URLS for Assets\nresults_urls = [granule.data_links() for granule in filtered_results]\n\nGranules often have several assets associated with them, for example, ECO_L2T_LSTE has several assets: - Water Mask (water) - Cloud Mask (cloud) - Quality (QC) - Land Surface Temperature (LST) - Land Surface Temperature Error (LST_err) - Wide Band Emissivity (EmisWB) - Height (height)\nThe results list we just generated contains URLs to all of these files. We can further filter our results list using string matching to remove unwanted assets.\nCreate a list of strings and enumerate through our results_url list to filter out unwanted assets.\n\nfiltered_asset_links = []\n# Pick Desired Assets (leave _ on RFL to distinguish from RFLUNC, LST. to distinguish from LST_err)\ndesired_assets = ['RFL_', 'LST.'] # Add more or do individually for reflectance, reflectance uncertainty, or mask\n# Step through each sublist (granule) and filter based on desired assets.\nfor n, granule in enumerate(results_urls):\n for url in granule: \n asset_name = url.split('/')[-1]\n if any(asset in asset_name for asset in desired_assets):\n filtered_asset_links.append(url)\nfiltered_asset_links\n\n['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_26223_012_11SKU_20230219T202943_0710_01/ECOv002_L2T_LSTE_26223_012_11SKU_20230219T202943_0710_01_LST.tif',\n 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230219T202951_2305013_003/EMIT_L2A_RFL_001_20230219T202951_2305013_003.nc',\n 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ECO_L2T_LSTE.002/ECOv002_L2T_LSTE_26921_001_11SKU_20230405T190258_0710_01/ECOv002_L2T_LSTE_26921_001_11SKU_20230405T190258_0710_01_LST.tif',\n 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230405T190311_2309513_002/EMIT_L2A_RFL_001_20230405T190311_2309513_002.nc',\n 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20230405T190323_2309513_003/EMIT_L2A_RFL_001_20230405T190323_2309513_003.nc']\n\n\nUncomment the cell below (select all, then ctrl+/) and download the data that we’ve filtered.\n\n# # Get requests https Session using Earthdata Login Info\n# fs = earthaccess.get_requests_https_session()\n# # Retrieve granule asset ID from URL (to maintain existing naming convention)\n# for url in filtered_asset_links:\n# granule_asset_id = url.split('/')[-1]\n# # Define Local Filepath\n# fp = f'../data/{granule_asset_id}'\n# # Download the Granule Asset if it doesn't exist\n# if not os.path.isfile(fp):\n# with fs.get(url,stream=True) as src:\n# with open(fp,'wb') as dst:\n# for chunk in src.iter_content(chunk_size=64*1024*1024):\n# dst.write(chunk)\n\nCongratulations, now you you have downloaded concurrent data from the ECOSTRESS and EMIT instruments on the ISS."
},
{
"objectID": "python/01_Finding_Concurrent_Data.html#contact-info",
diff --git a/setup/setup_instructions.html b/setup/setup_instructions.html
index 8ea4007..88c8106 100644
--- a/setup/setup_instructions.html
+++ b/setup/setup_instructions.html
@@ -491,7 +491,9 @@ Contact Info
-
+
diff --git a/sitemap.xml b/sitemap.xml
index 7d12a5e..787aec2 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,34 +2,34 @@
https://nasa.github.io/VITALS/setup/setup_instructions.html
- 2023-11-06T17:14:10.046Z
+ 2023-11-06T21:54:43.731Z
https://nasa.github.io/VITALS/python/ECOSTRESS-EMIT_Carpinteria_Workshop.html
- 2023-11-06T17:14:08.831Z
+ 2023-11-06T21:54:42.461Z
https://nasa.github.io/VITALS/index.html
- 2023-11-06T17:14:06.376Z
+ 2023-11-06T21:54:38.271Z
https://nasa.github.io/VITALS/CODE_OF_CONDUCT.html
- 2023-11-06T17:14:04.863Z
+ 2023-11-06T21:54:36.853Z
https://nasa.github.io/VITALS/CHANGE_LOG.html
- 2023-11-06T17:14:04.243Z
+ 2023-11-06T21:54:36.073Z
https://nasa.github.io/VITALS/CONTRIBUTING.html
- 2023-11-06T17:14:05.904Z
+ 2023-11-06T21:54:37.789Z
https://nasa.github.io/VITALS/python/01_Finding_Concurrent_Data.html
- 2023-11-06T17:14:08.120Z
+ 2023-11-06T21:54:41.684Z
https://nasa.github.io/VITALS/schedule.html
- 2023-11-06T17:14:09.298Z
+ 2023-11-06T21:54:42.975Z