Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolve best practices on URLs #1325

Open
m-mohr opened this issue Sep 18, 2024 · 8 comments
Open

Evolve best practices on URLs #1325

m-mohr opened this issue Sep 18, 2024 · 8 comments

Comments

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 18, 2024

The best practices on https://github.com/radiantearth/stac-spec/blob/v1.0.0/best-practices.md#use-of-links seem partially ambiguous and may have evolved a bit.

We have the following in the best practices:

  • Self-contained Catalogs: no self link, STAC links relative, assets absolute or relative
    • Self-contained Metadata Only: same as above, but assets absolute
    • Self-contained with Assets: same as above, but assets relative
  • Published Catalogs
    • Absolute Published Catalog: absolute self link in all files, STAC links and assets absolute
    • Relative Published Catalog: absolute self link in root file, STAC links and assets relative (but it doesn't require a root link in all files, which is not optimal?!)

The approach that I recommend is:

  • Absolute self link in all files
  • Relative STAC links
  • Relative assets unless hosted externally

STAC Browser also recommends what I recommend (unsurprisingly ;-) ) or the Absolute published catalog. Those two generally work best for STAC Browser as it can generate absolute self links for an entity without requesting the root via additional request. It seems that this type is not reflected in the best practices though. I feel like the best practices should evolve a bit and may even be simplified.

The cases above come up very naturally, the difference is usually the self links. Everything else should just be consistently being used for STAC links and assets. Depending on the given hosting context the assets are also often just absolute.

Thoughts? Otherwise, I'll try to create a PR soon...

@gadomski
Copy link
Collaborator

The approach that I recommend is:

  • Absolute self link in all files
  • Relative STAC links
  • Relative assets unless hosted externally

I agree with this except I prefer absolute asset hrefs. I find that STAC is much more mobile than assets. You copy STAC down into some local data store, you load them into a backend, whatever, but the assets stay put. I even think that co-hosting STAC and assets might (hot take alert 🌶️ ) be an anti-pattern.

I don't think self-contained catalogs are very useful at all (I think @matthewhanson agrees with me on this but could be mis-remembering) and would be 👍🏼 on removing them from the best practices. I don't love "Absolute Published Catalogs" either, I don't really see what advantage they have over the Matthias's Preferred Approach™ — the tooling is generally good enough to resolve relative links, so I think the complexity win of removing "Absolute Published Catalogs" is worth it.

So, to summarize:

  • Remove "Self-contained Catalogs"
  • Remove "Absolute Published Catalogs"
  • Modify "Relative Published Catalogs" to be "Matthias's Preferred Catalogs™" 😂

I think there's an advantage in having "one preferred way" in the Best Practices, with some discussion on why you might tweak, e.g. why you would choose absolute or relative asset hrefs.

@jbants
Copy link
Collaborator

jbants commented Sep 19, 2024

I agree with best practices that assets should stay put and require an absolute link. STAC is a signpost to the data, after all.
Removing Self-contained and Absolute published catalogs and changing Relative published catalogs to MPC (I wonder if that acronym is taken) seems like a good idea. The spec doesn't prohibit these types of catalogs; they're just not optimal.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Sep 20, 2024

Haha, I guess we can find a better name :-P

Thanks for your inputs.

I don't feel strongly about rel vs abs assets. As you two advoate for it, what's the benfit of having an absolute links? I don't find a good reason yet why this should necessarily be absolute if there's an absolute self link. Could you elaborate on that? @gadomski @jbants

@gadomski
Copy link
Collaborator

As you two advoate for it, what's the benfit of having an absolute links?

I think the benefit is as a signalling mechanism, i.e. how we think STAC should be used. By recommending absolute asset hrefs but relative links, we tell folks "you can and should STAC around as much as you want, but you probably want to keep your assets in one place." Relative asset hrefs make it feel like the STAC should live next to the assets, which (at least to me) feels a little wrong.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Sep 22, 2024

Regarding assets only: Okay, so it's the moving aspect. On the other hand, for me relative hrefs would keep the maintenance burden a bit down as you only have to update one href (self link). Maybe we don't recommend anything and leave it up to users? I only see minor pros/cons so far for absolute/relative hrefs respecitively, so maybe not worth a best practice yet?

@gadomski
Copy link
Collaborator

gadomski commented Sep 22, 2024

Sorry, just to be clear, I am advocating for relative links and absolute assets.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Sep 22, 2024

Sorry, also just to be clear: I was just talking about assets above, links in this context is highly ambiguous (didn't realize) and so I just replaced links with href above. Please reread ;) (I'm advocating for no best practice regarding assets, I guess).

@gadomski
Copy link
Collaborator

Here's my null-hypothesis best practice for asset hrefs:

  • Picking randomly for every asset href is bad, so you should pick one for your catalog (either absolute or relative)
  • If you choose relative, your maintenance burden if you move your STAC items and assets together is less
  • If you choose relative, your maintenance burden if you move your STAC items and assets separate is less
  • So, if you think your assets are light (maybe less than 20MB?! who knows 🤷🏼) you should choose relative. If your assets are heavy (> 100MB) you should choose absolute

Because those break-points are so fuzzy, maybe our guidance should be "pick one (relative or absolute) for asset hrefs and stick with it, and if they're big (e.g. lidar) they should probably be absolute".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants