Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification of product changes as safe or unsafe seems overly strict #1617

Open
christophfriedrich opened this issue Jul 24, 2024 · 3 comments

Comments

@christophfriedrich
Copy link

christophfriedrich commented Jul 24, 2024

The way to assess a change as safe or unsafe seems to be that a very limited set of changes is classified as safe, while all others default to unsafe. I did some digging and found this list of allowed changes:

updates_allowed = {
('description',): changes.allow_any,
('license',): changes.allow_any,
('metadata_type',): changes.allow_any,
# You can safely make the match rules looser but not tighter.
# Tightening them could exclude datasets already matched to the product.
# (which would make search results wrong)
('metadata',): changes.allow_truncation,
# Some old storage fields should not be in the product definition any more: allow removal.
('storage', 'chunking'): changes.allow_removal,
('storage', 'driver'): changes.allow_removal,
('storage', 'dimension_order'): changes.allow_removal,
}

To me it seems that this list was set up at some point with someone thinking "These should really be safe", without actually assessing all others, which might be perfectly safe too. At least that's how I explain to myself that the datacube product update command actually has an --allow-unsafe option. In practice, there seem to be enough situations that this was actively implemented -- situations where a change would be prohibited by the checking mechanism, but is actually safe, so experienced admins want to just work around it.

IMO, it would be better to expand the list of allowed changes instead of routinely circumnavigating the checks against that list. Because I'm not an experienced admin who understands the system enough to assess whether he can use --allow-unsafe for the case at hand. And even if I was, structured checking by code is still better than error-prone checking by humans. I'd like to be able to just rely on the checks -- of course without being restricted unnecessarily.

For my specific case: Why should adding a new alias to a measurement be unsafe? Why is this operation not in the list of allowed changes? I filled out the rest of this template with this case.

Expected behaviour

Adding a new alias to a product's measurement is a safe change (because it seems like a rather easy thing to me).

Actual behaviour

Running datacube product update with a YAML that adds a new alias is reported as an unsafe change.

Case where no alias was configured before and the first is to be added:

2024-07-24 13:01:00,161 948 datacube.index.postgres._products WARNING Unsafe change in measurements.0.aliases from missing to ['B01']

Case where one alias was configured before and a second is to be added:

2024-07-24 13:11:47,312 1037 datacube.index.postgres._products WARNING Unsafe change in measurements.13.aliases.1 from None to 'AOT'

Note that I'm only adding aliases, not changing or removing any (which of course might be unsafe).

Steps to reproduce the behaviour

Run datacube product update --dry-run foobar.yml with a YAML that is identical to the one before, except for an additional alias.

Environment information

  • Which datacube --version are you using? -> Open Data Cube core, version 1.8.18
  • What datacube deployment/enviornment are you running against? idk what that means (I'm not the main admin of this instance)
@SpacemanPaul
Copy link
Contributor

I agree that adding a measurement alias should be "safe".

There are however a number of features of the current database schema that make a full and accurate of automated assessment of whether a given change is "safe"or not a more complicated problem than it might first appear, and the historic approach has been to err on the side of caution (but give the expert user enough rope to make their own call).

I'm expecting that this problem will become more manageable as we move to new database schemas and shed more legacy functionality in datacube-1.9 and 2.0. We probably won't be looking at addressing this before then - but I'm happy to look at a PR if you want to have a go.

@christophfriedrich
Copy link
Author

Thanks for your feedback Paul. Based on you too believing that it should be safe, I simply executed the command with --allow-unsafe and haven't run into problems yet. (However, doing so didn't fix the problem why I was looking into this in the first place, which is why I came up with this workaround instead, which I might be PRing to odc-tools.)


But I'm happy to look at a PR

I'd be happy to supply one, depending on what exactly you had in mind here. Adding "adding a measurement alias" to the list of allowed operations? I had a look at /datacube/utils/changes.py and believe I could add "creations of and insertions into an array", which could then be used in /datacube/index/postgres/_products.py for something like:

updates_allowed = { 
     ('measurements','aliases'): changes.allow_creation_or_addition,

But for anything more generic, I would be way too paranoid to overlook something, so that I would leave this to people who actually know the database schema. Or wait for the new schema. Your approach to "err on the side of caution but give experts enough rope" is fully reasonable.

@SpacemanPaul
Copy link
Contributor

The approach you suggest above sounds good - I'll look forward to your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants