Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBS runs API returns runs from invalid files #99

Open
germanfgv opened this issue Aug 3, 2023 · 5 comments
Open

DBS runs API returns runs from invalid files #99

germanfgv opened this issue Aug 3, 2023 · 5 comments
Assignees

Comments

@germanfgv
Copy link

germanfgv commented Aug 3, 2023

Users have noticed that after file invalidations, DBS still displays block and dataset runs data as if the files were still there

Lets use dataset /DisplacedJet/Run2023C-PromptReco-v2/MINIAOD as an example.
This dataset initially included files from run 367661, but they were all invalidated, as can be seen here:

The problem is that when users try to list the runs included in the dataset, the run still appears, even tho there are no files from that run in the dataset. Here you can see that run 367661 appears as one of the runs in /DisplacedJet/Run2023C-PromptReco-v2/MINIAOD:

Of course, the same information is provided to users when they use DAS: https://cmsweb.cern.ch/das/request?instance=prod/global&input=run+dataset%3D%2FDisplacedJet%2FRun2023C-PromptReco-v2%2FMINIAOD

After discussing this on MM, @amaltaro formulated the following questions

  1. did this behavior change between the old (python) implementation and the new (golang) one?
  2. when DBS is fetching the runs, should it exclude files that are invalid? should the REST API accept a query string to enable/disable this?

I would argue that the DBS API should have the option to only show runs from valid files, and that this should be the default behavior when querying from DAS.

Could DBS experts comment on this issue? @vkuznet @d-ylee

@vkuznet
Copy link
Contributor

vkuznet commented Aug 14, 2023

Hi @germanfgv , to answer to Alan's questions:

  1. there is no changes between Go and Python implementation of the server
  2. yes, we may add an option to DBS API to display runs for valid files only.

@d-ylee please take care of this request and provide validFileOnly option to runs API which will be used to show runs only for specific files, and leave default behavior as is, i.e. it should show all runs regardless of file status.

@germanfgv
Copy link
Author

Thank you @vkuznet
I still have one question. After this is implemented, what would be the behaviour of DAS when querying for the runs in a dataset? i.e.
https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=run+dataset%3D%2FDisplacedJet%2FRun2023C-PromptReco-v2%2FMINIAOD

The current behavior is causing confusing for some end users

@vkuznet
Copy link
Contributor

vkuznet commented Aug 14, 2023

By default DAS shows valid files only, and therefore we'll need to adjust its queries to use new option in runs API, i.e. it will get list of runs for valid files, but allow users to overwrite it in DAS queries via valid=0 option. Said that, we'll need another issue in DAS to address this once DBS side will be corrected.

@germanfgv
Copy link
Author

@vkuznet I'll create a new issue then in the https://github.com/dmwm/das2go repository. Please let me know if this is not the right place.

@vkuznet
Copy link
Contributor

vkuznet commented Aug 14, 2023

Yes, it is correct place, just add to the ticket appropriate dbs2go one that they will be linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants