Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change JSONFields to more specific Fields #3653

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pedro-psb
Copy link
Member

@pedro-psb pedro-psb commented Jul 1, 2024

Closes: #3657


Previous Problem this was trying to solve

Problem

As mentioned in #3639, the DRF JSONFields are too generic and can be any JSON primitive (str, bool, etc), not just dicts or lists (as our codebase appears to be interpreting it).

After drf-spectacular fixed that interpretation on their side, our openapi schema broke because Object from JSONField became Any, which matches what JSON Fields are in DRF but not what we think it meant before.

Approaches considered

  1. Change all JSONField to some other fields that represents better what the field actually is. Example, repo_config is a DictField, some listy object should be ListField, etc.
  2. Use extend_schema to modify how the openapi schema will be generated. It affect only schema generation, AFAIK.
  3. Force JSONField to be a OA 'object', as suggested here. The downside is that the representation of our fields would continue to be inaccurate.

@dralley
Copy link
Contributor

dralley commented Jul 1, 2024

What do you mean by "The downside is that the representation of our fields would continue to be inaccurate."?

@pedro-psb
Copy link
Member Author

pedro-psb commented Jul 1, 2024

For example, the Repository.repo_config is a JsonFIeld. We want it be a json "Object Type" (or dict), but the JSONField serializer will happily accept a list or any json primitives, such as ints, booleans.

Edit: but given all those test failures, I'm thinking about doing the alternative fix and create another issue about reviewing the JSONFields, since using them has worked so far. Wdyt?

@pedro-psb pedro-psb marked this pull request as draft July 1, 2024 16:53
@github-actions github-actions bot added the multi-commit Added when a PR consists of more than one commit label Jul 1, 2024
@pedro-psb pedro-psb force-pushed the fix/json-fields-breaking-bindings branch 3 times, most recently from 117cb50 to 74f6803 Compare July 1, 2024 18:33
@github-actions github-actions bot removed the multi-commit Added when a PR consists of more than one commit label Jul 1, 2024
@pedro-psb pedro-psb marked this pull request as ready for review July 1, 2024 18:54
@dralley dralley marked this pull request as draft July 1, 2024 21:04
@dralley dralley marked this pull request as ready for review July 1, 2024 21:04
@mdellweg
Copy link
Member

mdellweg commented Jul 2, 2024

IMHO Option 3. is a non starter, because it leaks into the whole codebase and affects other plugins too.
Have you considered updating the bindings container? Maybe there is a newer version that knows how to deal with Any.
But anyway, if you know that repo-config is supposed to be a dictionary, than that is exactly the type of change we should do to the schema here.

@pedro-psb
Copy link
Member Author

IMHO Option 3. is a non starter, because it leaks into the whole codebase and affects other plugins too.

Hmm, I didnt know about that.

Have you considered updating the bindings container? Maybe there is a newer version that knows how to deal with Any.

Will check that out.

But anyway, if you know that repo-config is supposed to be a dictionary, than that is exactly the type of change we should do to the schema here.

Yes, it is. I started by naively changing all the fields, but that broke the tests. It'll require a more thoughtful approach.

@pedro-psb pedro-psb marked this pull request as draft July 2, 2024 18:35
@dralley
Copy link
Contributor

dralley commented Jul 5, 2024

For the sake of documenting it on Github, re: the discussion we had on Slack, is it correct to say that upgrading openapi-generator-cli to >=5.30 will resolve this issue?

(but with some risk of breakage it sounds like - in terms of backports)

@mdellweg
Copy link
Member

mdellweg commented Jul 5, 2024

For the sake of documenting it on Github, re: the discussion we had on Slack, is it correct to say that upgrading openapi-generator-cli to >=5.30 will resolve this issue?

(but with some risk of breakage it sounds like - in terms of backports)

From the looks of it, yes.
The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

@pedro-psb
Copy link
Member Author

I'm working on getting it pass our basic tests here: pulp/pulp-openapi-generator#101

About the break risk, we could ask katello to run some tests with the updated bindings before we do anything.

@dralley
Copy link
Contributor

dralley commented Jul 5, 2024

The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

Well that would seem to be a long-term problem, surely we need to update the templates eventually.

@mdellweg
Copy link
Member

mdellweg commented Jul 8, 2024

The risk with updating the openapi-generator templates is, that it immediately affects all new and old versions of all of Pulp plugins. I'm unsure if we are ready to do that.

Well that would seem to be a long-term problem, surely we need to update the templates eventually.

Absolutely. And actually I think we may need to take different versions of it for different versions of pulpcore.
My take is, improving the annotations of the individual fields here and now is a fast solution that is also future proof and worth doing in any scenario.

@pedro-psb
Copy link
Member Author

Given those considerations, I'll back to work on this as an immediate solution to the broken bindings.

I made progress on the work with fixing the bump regressions, but I reached a deadline there where I don't know where to go (without getting really deep into openapi stuff).

@pedro-psb pedro-psb force-pushed the fix/json-fields-breaking-bindings branch from 74f6803 to 300683f Compare July 9, 2024 16:54
@github-actions github-actions bot added the multi-commit Added when a PR consists of more than one commit label Jul 9, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test are passing, but it doesnt look right (see snippet below).

We have a clear definition of what this "json" object should be (here). Is there any reason not to create/add serializers that precisely describe its structure?

That's how it looks now with only the List specification and without knowing anything about its items:
(I've tried going a step further and using child=DictField(), but that broke stuff).

"files": [
  [
    null,
    "/tmp/",
    "duck.txt"
  ]
],
"requires": [
  [
    "cockateel",
    null,
    null,
    null,
    null,
    false
  ],
  [
    "lion",
    null,
    null,
    null,
    null,
    false
  ]
],
"provides": [
  [
    "duck",
    "EQ",
    "0",
    "0.6",
    "1",
    false
  ]
],

Copy link
Contributor

@dralley dralley Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema is roughly speaking exactly as it is represented by createrepo_c, as that is what we use for parsing the metadata and also for publishing it.

Unfortunately that's not documented super well upstream either, but obviously you've seen the comments and here's an example w/ createrepo_c: https://github.com/rpm-software-management/createrepo_c/blob/master/examples/python/manual_repodata_parsing.py#L13C1-L35C63

It's not the ideal format for display to the user, but doing the minimum amount of work to transform the data during sync and publish was fairly important for performance. That's partly why we used JSONField also - it preserves order without needing to do sorts and joins during query-time. Coming from Pulp 2 where publish performance sucked (for different reasons but nonetheless), we were pretty sensitive to that concern.

To be honest that may or may not be an issue in practice nowadays. Ergonomically it is still a bit more painful though.

As for why not use sub-serializers - I think it was just simpler at the time, and survived from the prototype phase. If we can use sub-serializers and experience benefits from that without unacceptable drawbacks then I'd merge that patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "doesn't look right"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, makes sense these are not db fields.

About the benefit of sub-serializers, maybe some of the fields could benefit more and be less risky of bringing significant drawbacks, such as the config ones (repo_config and copy config).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I've added an example of the config on the field help, but I actually think the right approach is defining a precise serializer (in this example, its this spec), so a correct sample will be generated by Redoc/OpenApi themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That schema isn't used by OpenAPI at all, fwiw. It's validated by our own code in the serializer.

Unless you just mean that's an example of the sort of thing we'd need to provide for OpenAPI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the serializer and its fields are used by openapi/redoc to generate the api docs page.

Notice the config: { } in Request Samples, and how it says its an Object type.

image

Copy link

stale bot commented Oct 12, 2024

This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!

@stale stale bot added the stale label Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-commit Added when a PR consists of more than one commit stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drf JSONField usage is innacurate
3 participants