Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My thoughts on coordinate #48

Open
martindurant opened this issue May 29, 2024 · 8 comments
Open

My thoughts on coordinate #48

martindurant opened this issue May 29, 2024 · 8 comments

Comments

@martindurant
Copy link
Member

Sorry for getting distracted at the end of the geo-zarr meeting we just had (for those that were there). Here is a summary of what I was getting at.

(@rabernat , yes I know this has been discussed many times over - apologies)

There are two principal parts to the coordinates problem:

  • coordinate tranform
  • parsing/reading coordinate definitions

Coordinate transform

A mechanism within zarr/xarray to find (each of) the coordinates of a given array position and the (fractional) array location of a given coordinate set. This should be a vectorized operation each way.

Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).

  • I suggest that this should be an extension point, each associated with a different internal representation (e.g., affine is usually a square matrix, explicit arrays are usually one- or two-dimensional arrays with sizes determined by the data)
  • on day 1, we want to support explicit values and affine (linear transform)
  • other transforms should be pluggable, and eventually include for instance the large number of each curvature models built into grib
  • whether we should have a single affine matrix across all dimensions (lon, lat, time = f(x, y, z)), or if we should split dimensions (lon, lat = f1(x, y); time = f2(z)) is a decision to be taken early.
  • the coordinates interface must support slicing and might support units.

Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.

Coordinate definitions

In the meeting, a few specific (geo) coordinate definitions were mentioned:

  • gdal coefficients
  • tiff bounding box
  • CRS text/parameters

plus, of course, netCDF explicit arrays (with or without CF). I also mentioned astro WCS as a reference point (which supports explicit, affine, and various analytic forms for arbitrary dimensionality with no geo reference; interestingly, it also applies to fields of tables).

I would suggest that it is the job of geo-zarr to build the converters to and from these styles of definitions to transform internal representation, such that you can round-trip coordinate information without losing accuracy.

@dblodgett-usgs
Copy link

Wish I had space to take part in this work more... sorry to pop into this issue out of the blue, but I can't resist.

I Couldn't agree more @martindurant.

A potential source for inspiration on this is the implementation of rectilinear, curvilinear, and discrete spatio-temporal array axes in the stars R package. @edzer may be able to weigh in / advise. https://r-spatial.github.io/stars/articles/stars4.html is probably a good place to start.

@mdsumner
Copy link

mdsumner commented Jun 2, 2024

I'm also trying to find my feet in this Python heavy space. Shouldn't this be a Zarr topic? Non lonlat geography exists in "geo", and even xarray has recognized the need to move beyond degenerate rectilinear arrays as the most compact referencing model. Zarr itself needs these compact forms as well, it's more about graphics and model arrays than geo-anything. Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain, independently of whether a NetCDF style or more general framework is used. I just worry this tent isn't broad enough, but I appreciate the importance (and brilliance) of Zarr. If it can get this smarter referencing for regular or graphics arrays, and not mix up regular-grids-devolved-to-longlat with real curvilinear cases it will truly be a general and future-proof framework.

@martindurant
Copy link
Member Author

Shouldn't this be a Zarr topic?

Yes, certainly it could be copied there; or maybe the coordinate interpreting discussion belongs in xarray? Maybe zarr simply presents the attributes defining coordinates mapping to other libraries, but personally I'd be happy to see the f(x, y, z, ...) and its inverse(s) defined in zarr.

Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain

Exactly. I particularly have in mind medical ("device" and "patient" coordinates, normally affiune transforms) and astro (curvilinear celestial coordinates and physical units like wavelength), because of my background.

@christophenoel christophenoel changed the title My thoughts My thoughts on coordinate Jun 24, 2024
@christophenoel
Copy link

I just want to drop this here: the OGC specification that deals with all types of coverage and their encoding is OGC Coverage Implementation Schema 1.1 : https://docs.ogc.org/is/09-146r6/09-146r6.html#39

@benbovy
Copy link

benbovy commented Sep 18, 2024

Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).

Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.

This is outside of the geozarr scope and more specific to Xarray, but I'd be happy to help towards having some kind of CoordinateTransform class built in Xarray, which would provide an abstract interface + a minimal set of features (e.g., for data slicing) such that it can be easily reused in 3rd-party, domain-specific Xarray indexes.

(note: Xarray "flexible" indexes are still not well documented)

@christophenoel
Copy link

@benbovy : Do you have any information about these flexible indexes. How does it work ?

@benbovy
Copy link

benbovy commented Sep 25, 2024

@christophenoel - If I had to summarize how it works in one sentence: Xarray coordinates are all about data (labels) and metadata (attributes) whereas Xarray Index provides an API that allows dealing with these data / metadata in a highly customizable way for most common Xarray operations (isel, sel, align, concat, stack...). Xarray indexes are also stateful objects that may hold and propagate additional information (arbitrarily structured) along with coordinate labels and attributes.

There's only very basic documentation here: https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html

You can also have a look at the different examples collected in pydata/xarray#7041.

Also specific to this issue: pydata/xarray#9543

More to come soon(-ish), hopefully!

@rabernat
Copy link

Everyone on this repo should check out Benoit's PR above. It's exactly what we need to move this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants