Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding concerns with CF encoding of CRS #20

Open
christophenoel opened this issue Apr 13, 2023 · 23 comments
Open

Understanding concerns with CF encoding of CRS #20

christophenoel opened this issue Apr 13, 2023 · 23 comments

Comments

@christophenoel
Copy link

christophenoel commented Apr 13, 2023

As it has been reported multiple times that the CF encoding of CRS (Coordinate Reference System) may pose issues, I propose starting a dedicated thread to gain a better understanding of these concerns.

Furthermore, I recently converted TerraSAR GeoTiff to Zarr, using rioxarray in conjunction with gdal and xarray, which resulted in the following CF grid mapping. I am somewhat surprised by the way GDAL encoded the CRS, which could be a result of the combination of gdal with xarray.

{
    "GeoTransform": "10.50014820907392 0.00012070250150375875 -2.609127035245679e-05 53.85791404256921 -1.5387410410647694e-05 -7.246156895736279e-05",
    "_ARRAY_DIMENSIONS": [],
    "crs_wkt": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\"
,0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]",
    "geographic_crs_name": "WGS 84",
    "grid_mapping_name": "latitude_longitude",
    "inverse_flattening": 298.257223563,
    "longitude_of_prime_meridian": 0.0,
    "prime_meridian_name": "Greenwich",
    "reference_ellipsoid_name": "WGS 84",
    "semi_major_axis": 6378137.0,
    "semi_minor_axis": 6356752.314245179,
    "spatial_ref": "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0],UNIT[\"degr
ee\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"
}

EDIT NOTE: A key topic of this discussion concerns the need to make grid_mapping mandatory in order to exclude the possibility of having an implicit (undefined) CRS. Furthermore, grid_mapping includes several formats, and it would be wise to recommend one of them (WKT, Proj4j, ...).

@dblodgett-usgs
Copy link

What are you surprised by?

Most of what's in there is as would be expected per:
https://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format

This page provides a fairly complete set of mappings:
https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

"spatial_ref" does not appear in the CF convention but has been around in the community for a while. Someone implemented it quite a few years ago and it's been adopted in a number of places.

"_ARRAY_DIMENSIONS" is presumably being added by xarray?

@christophenoel
Copy link
Author

That's not my point. I'm just trying to understand why there is a concern (discussed many times during the meeting) with this CF way of encoding CRS while GDAL/Xarray seems to generate it without problem when converting a GeoTiff to Zarr.

Why do we want to change this ?

@dblodgett-usgs
Copy link

The rub is the need to implement a cross walk between the CRS-WKT / EPSG representation of CRS and the grid_mapping representation of CRS. The crs_wkt is optional and all the particular grid_mapping attributes are required.

CF-NetCDF requires that you include parameters using the attribute specification from cf that duplicate what most geospatial software supports via PROJ.

I think the premise here is that we would be saying that crs_wkt is required and that the other grid_mapping attributes are encouraged to allow compatibility with software that does not support crs_wkt?

@djhoese
Copy link

djhoese commented Aug 2, 2023

Jumping in on this issue after this spec repository was pointed out to me. I'm the creator of the very very new geoxarray package which basically tries to take all the non-GDAL specific CRS and dimension/coordinate information functionality from rioxarray but without the GDAL/rasterio dependency. It is also meant to be more generic than rioxarray.

Anyway, I wanted to chime in with some information from my understanding having working on a similar issue in geoxarray. First, my understanding is that spatial_ref is for GDAL compatibility. Second, rioxarray and geoxarray both use the crs_wkt as a shortcut/short-circuit when reading CRS information to avoid having to convert/parse all the CF grid mapping attributes that might also exist. Both libraries depend on pyproj to do the CRS/WKT -> CF grid_mapping conversion.

There is also the case of projections that aren't supported by CF. In those cases Pyproj (and therefore rioxarray and geoxarray) will only produce the crs_wkt/spatial_ref attributes with WKT since there are no corresponding CF attributes. I believe it has been brought up on various CF discussions that it is likely a bad idea to re-invent a way of representing CRS information especially when it isn't all inclusive/supportive and the PROJ library already has a pretty good handle on it. I'm not sure what the end result of those discussions was though.

@dblodgett-usgs
Copy link

Great info. Thanks @djhoese -- UW Madison representing here, I love it! (I'm an '09 Civil Engineering grad)

@christophenoel
Copy link
Author

christophenoel commented Aug 18, 2023

I want to take this good news of a new expert partipating to summarize the question for novices like me. Please correct me if I'm wrong.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth (so not WGS 84 ellipsoid) when declaring the axis or standard_name property. CF conventions also allow to express CRS definitions via the grid_mapping attribute which refers to a separate variable.

This grid mapping variable provides the description of the mapping via a collection of attached attributes. There are several ways to describe the CRS:

  • The attribute grid_mapping_name allows to set a standard name (listed in CF annex F) which map to a known CRS (e.g. lambert_conformal_conic, mercator, etc.) based on the associated properties.
  • The attribute crs_wkt also to express multiple CRS properties in WKT (used by GDAL/OGR) in the property crs_wkt
  • Since v1.7, the Grid Mapping attributes have been extended to support various other representations (such as OGC WKT, EPSG and PROJ.4). The mappings are provided on Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

Note: spatial_ref is the property commonly used by GDAL to define the CRS in WKT. I believe therefore that GDAL ignores crs_wkt in grid_mapping variable.

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

@christophenoel
Copy link
Author

Another useful information reported by Roger Lott (OGC CRS SWG co-chair):

Please be aware that the OGC CRS SWG has been developing an exchange format for gridded geodetic data (GGXF). At the outset of this project we assumed that the netCDF-CF conventions would be used as a starting point for GGXF. We quickly found that they were not suitable for the rigorous geodetic requirements for CRS definitions, and have developed a separate set of GGXF Conventions.

On superficial look it does not seem that these address anything in your github issue, but if you are interested in the GGXF conventions they are in Annex B of OGC 22-051r2. r3 of this doc will be posted shortly but the changes are unlikely to be significant to you (mainly editorial following the OGC public comment review). A not-quite-complete draft of r3 is in the GGXF Github issue #59 (use the 2023-07-24 version towards the end of the issue chain).

A couple of other comments that may be more relevant to the thread:
• CF netCDF conventions requires the CRS-WKT to adhere to what is sometimes called "WKT2". (See CF conventions 5.6.1 for "The crs_wkt attribute should comprise a text string that conforms to the WKT syntax as specified in reference [OGC_WKT-CRS]"). The CF conventions quote adherence to 12-063r3 (=WKT2 v1) but that has been superseded by 18-010r11 (=WKT2 v2). The difference in the two versions of WKT2 is an upgrade for dynamic CRSs (coordinates that change with time). In the the snippet that you give in the Github issue, neither crs_wkt nor spatial_ref is WKT2 of any variety. They use one of the many flavours of WKT1 (that supported by Proj/GDAL). You can tell whether the string is WKT1 or WKT2 by examination of the first keyword. If it ends in CS (e.g. GEOGCS) it is WKT1, if it ends in CRS (e.g. GEOGCRS) it is WKT2. The snippet therefore is not compliant with the CF conventions.
• GGXF, like netCDF-CF, uses WKT2 for describing CRSs and transformations.
• There has been criticism of WKT2 by some. My understanding is that this is due to the need to escape certain characters, in particular the degree symbol (°) and the double quote character ("), when WKT2 strings are embedded in other languages. It is my conjecture, but perhaps this may be the concern that the Github issue is trying to capture.

@dwilson1988
Copy link

My conclusion

From my understanding, CF advocates for using its standard properties to express any type of representation (WKT, PROJ.4, etc.). Since we want to base GeoZarr on CF, I would lean towards making the grid_mapping attributes as mandatory (using the mapping table above for PROJ.4, WKT representation) and encouraging crs_wkt to facilitate compatibility with tools that only support WKT (but GDAL will still have to implement the support of crs_wkt property)

This solves a decent part of the original concern around the original issue (christophenoel#3) just to mention it here. I just want to make sure I'm understanding a few things:

  1. If a Zarr dataset is created in a spatial reference that the CF conventions does not explicitly support, there would be no path to create a compliant GeoZarr dataset, correct?

  2. If starting from an EPSG code or WKT/ProjJSON representation of a CRS, there is no straightforward way to map it to CF if I'm understanding the table you provided

  3. Just specifying the crs_wkt attribute WOULD NOT be a compliant GeoZarr dataset

The reason I bring this up specifically is we've been outputting Zarr from our geospatial system for the last few years and all of the CRS information is specified as EPSG Code, WKT, or ProjJSON. Unless there's a way to always (and reliably!) map from CRS-WKT to CF, supporting GeoZarr would be a nonstarter for us, but we'd VERY much like to adopt GeoZarr.

Any thoughts on this?

@djhoese
Copy link

djhoese commented Aug 18, 2023

Speaking from a user POV and not on the "standards"/specification design side of this: I would plan/hope to use the pyproj library in all of my Python code to do conversions between PROJ.4/WKT2/EPSG/CF. Not sure that is considered "straightforward" though.

I would hope that the issue in your point 1 above could be resolved in the future by suggesting to CF to adopt crs_wkt as an alternative to the CF grid mapping attributes (either the attributes or the crs_wkt or both with priority going to crs_wkt). I don't know how open they would be to that but the fact that not every projection is supported currently would make putting the responsibility on WKT more enticing. Similarly, add the functionality to GDAL to read crs_wkt in addition to the already supported spatial_ref.

@dwilson1988
Copy link

That's really the root of the rub here (correct me if I'm wrong): I can represent any CF grid mapping as WKT but not vice versa.

@dblodgett-usgs
Copy link

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

@dblodgett-usgs
Copy link

I would be in full support of mandating grid mapping and allowing use of only a crs_wkt in cases that the CF grid mapping attributes do not support the projection of the data in question.

@dwilson1988
Copy link

And to underscore @christophenoel's note above, RECOMMENDING that everyone also provide crs_wkt to ensure maximum interoperability.

@christophenoel
Copy link
Author

@christophenoel -- I don't think that this is quite accurate.

CF conventions (v1.10) allow to define the CRS implicitly as the geographic coordinate system based on a spherical Earth

This closed issue cf-convention/cf-conventions#410 indicates that there "is no default shape for the Earth in CF (spherical or otherwise)."

Does it has any impact (on a user point of view) of is it only a question of definition "geographic coordinate system based on a Earth" ?

@dblodgett-usgs
Copy link

The impact from a user's point of view is that when given no grid_mapping specifying the shape of the earth, CF is explicitly ambiguous (there is no default shape) rather than implicit with regard to the shape of the spheroid.

@christophenoel
Copy link
Author

Thank you very much for trying to explain. I think that my gaps in the geospatial field could lead to an endless discussion...

For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.

@djhoese
Copy link

djhoese commented Aug 18, 2023

For me, if we provide latitude/longitude in degrees, I expect to be able to navigate on Earth based on the real shape of our Earth. So it works for a single point, however, I assume that you're suggesting that the gridded data is projected in any case, so we should know the exact projection (e.g. WGS84) to be able to interpret the data correctly.

I could be misunderstanding what you're saying or asking, but there is no "real shape of our Earth" when it comes to this stuff. You must have some definition/model of the Earth (a coordinate system) or else you have no guarantee that degrees (lon/lat) in your data are the same as the degrees (lon/lat) of someone else's data (different radius of the Earth, different "center" point of the coordinate system, etc). Software could always make an assumption or have some default Earth for a geographic (degrees) coordinate system, but if we're talking about accuracy then there is no guarantee unless it is explicitly defined. In my opinion nothing should be implied for new data file formats and everything should be explicit.

@dblodgett-usgs
Copy link

Exactly @djhoese.

@christophenoel -- every (lat/lon or projected) coordinate that references the earth must either declare (explicit) or assume (implicit) an earth spheroid model upon which the latitude / longitude coordinate system is defined. The GPS system is based around the WGS84 geodetic datum.

Note that it is common practice in some fields (weather and climate in particular) to use different earth models interchangeably. i.e. use weather stations with elliptical lat/lon as if they use a spherical earth assumption. This practice has
led to tremendous confusion when using CF data.

These slides may be of interest.

@christophenoel
Copy link
Author

Okay, understood. There's no notion of degrees without a model. Intuitively, when I think of degrees, it's implicitly WGS84, which assumes an ellipsoid.

So, to get back to a your statement, I get that CF doesn't guarantee that WGS84 is presupposed.

@dwilson1988
Copy link

Just wanted to circle back on this topic - what's the timeline look like for discussing/implementing some of these changes around CRS? Is there a general consensus about supporting crs_wkt in GeoZarr?

@christophenoel
Copy link
Author

@dwilson1988 : When the SWG is approved by the OGC, a specification document (following OGC standards) will be created with the necessary structure, allowing anyone interested to collaborate on topics of their choice. However, nothing prevents someone from starting work on the tickets or in the repository

@rouault
Copy link

rouault commented Jan 19, 2024

I haven't followed what GeoZarr has decided regarding CRS encoding, but FYI GDAL 3.9 per OSGeo/gdal#9108 will be able to infer CRS in a Zarr dataset using a CF-1 grid_mapping variable (basically raw conversion of netCDF CF-1 to Zarr)

@christophenoel
Copy link
Author

This is good news, and I assume that the library supports all the extensions such as:
Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements

The meetings were interrupted for several months to create the OGC SWG, and they will resume this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants