Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Update #262

Merged
merged 11 commits into from
Aug 4, 2023
Merged

Schema Update #262

merged 11 commits into from
Aug 4, 2023

Conversation

willronchetti
Copy link
Member

  • Updates schema code to remove jsonschema-serialize-fork and increment our schema validator version to latest using the official jsonschema library
  • Support for $merge fields that reference fields in other schemas
  • TODO: test in Fourfront, CGAP, SMaHT

@@ -83,6 +82,7 @@ xlrd = "^1.0.0"
"zope.deprecation" = "^4.4.0"
"zope.interface" = ">=4.7.2,<6"
"zope.sqlalchemy" = "1.6"
jsonschema = "^4.18.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay. i hope this works and the others (cgap-portal, encoded-core, smaht-portal, but apparently not fourfront?) can drop dependency on the fork, too.

@@ -22,7 +22,7 @@
],
"serverDefault": "uuid4",
"permission": "restricted_fields",
"requestMethod": "POST"
"requestMethod": ["POST"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the information that's happening here a narrowing of the permissions needed for "POST" so you can't do it without "restricted_fields" access? If so, should this include "PUT" and "PATCH" as well? Not due to the other changes you're making here, which seem only to ask for a syntactic change, but because that syntactic change presumably accommodates the need for multiple verbs and there have been long-absent other verbs, or that would have required an "anyOf" construct that apparently was not in play, such that adding "PUT" and "PATCH" might seem to me a fix for a long-standing lurking bug.

"description": "Test object that has calculated properties to test",
"type": "object",
"required": ["name", "foo", "bar", "nested", "nested2"],
"properties": {
"schema_version": {
"type": "object",
"type": "string",
"default": "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth specifying a syntax regexp?

@@ -3,7 +3,7 @@
import pytest

from base64 import b64encode
from jsonschema_serialize_fork import Draft4Validator
from jsonschema import Draft202012Validator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be worth the effort to put

C4Validator = Draft202012Validator

and import that validator to use instead. At some level, you might think this only would be an issue of how much string replacement is needed to later update it. But it would also mean that anyone using a named validator anywhere would be known to be deviating. And it would be easier to cross-check (at unit test time, see my other remark about test_misc.py) the particular chosen validator against what declarations are needed in the schemas dir.

This could also be done later. But I'm not sure it's super-hard to do and I like when code protects itself between consistency problems.

It would also make it easier for portal repos to pick up a change in the validator needed and for them to check (again at unit test time) that schemas downstream were being kept up-to-date with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd somewhat prefer not to introduce indirection here just so it's very clear which we are using. It shouldn't be used outside of snovault so impact should be minimal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't all schemas in all repos have to mention the version?

@@ -533,6 +533,7 @@ class TestingServerDefault(Item):

@collection('testing-dependencies')
class TestingDependencies(Item):
""" BREAKING CHANGE - dependencies --> dependentRequired in schema """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this remark belongs in a (dated) comment, not in the doc string. The doc string documents a point on a graph, and the comment is commenting on the slope of a line passing through the point (or, more properly, on the discontinuity that occurs at that point). The same comment read later will look out-of-place and confusing if not dated.

}
)
assert result == {'foo': 'thing'}
assert errors[0].message == "'name' is a required property"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there others? Why not just test

assert list(map(lambda x: x.message, errors)) == ["'name' is a required property"]

so that it's clear there's not other stuff going by unnoticed. or

[error] = errors
assert error.message == ["'name' is a required property"]

@@ -41,8 +39,10 @@ def server_default(func):


class NoRemoteResolver(RefResolver):
""" This has been modified to allow remote resolution that does not involve an http, https or ftp url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is good form for a doc string, which should say what something does, not say how it's changed from something else, at least not without naming the something else.

Suggested change
""" This has been modified to allow remote resolution that does not involve an http, https or ftp url
"""
A refinement of RefResolver that disallows http, https or ftp urls.

@@ -41,8 +39,10 @@ def server_default(func):


class NoRemoteResolver(RefResolver):
""" This has been modified to allow remote resolution that does not involve an http, https or ftp url
ie: local remote resolution """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "i.e.," because it makes it clear it's an abbreviation, like "e.g.,"

But also, I think this clarifying line would not be needed if you picked a class name like LocalOnlyRefResolver or ReferResolverLocalOnly. The former name reads better in English but the latter is better for discovery on autocompletion. But I don't think NoRemoteResolver is even that graceful, and I think the loss of Ref in the name, out of context, loses useful information there's no reason to lose.

@@ -82,13 +82,13 @@ def favor_app_specific_schema_ref(schema_ref: str) -> str:
def json_file_contains_element(json_filename: str, json_element: str) -> bool:
"""
If the given JSON file exists and contains the given JSON element name then
returns True, otherwise returnes False. The given JSON element may or may
returns True, otherwise returnes False. The given JSON element may or may
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon before "otherwise" and comma after. I think recent versions of PyCharm probably remind you of things like this, as they seem to have connected a grammar checker.

Suggested change
returns True, otherwise returnes False. The given JSON element may or may
returns True; otherwise, returns False. The given JSON element may or may

Fixes typo "returnes", too.

Comment on lines +159 to +163
# Assumes we're only dealing with other JSON types
# like string, number, boolean, null, not other
# types like tuples, sets, functions, classes, etc.,
# which would require a deep copy.
resolved_data = data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the comment, is any kind of warning warranted? Should there be a debugging mode? Is it ever desirable for this to happen? In json.dumps, there's a default argument that is a function to apply in this situation that does coercion. e.g., default=str is common for objects with datetimes in them to get them dumped as strings. I'm OK with this just staying as-is, but I wanted to at least provoke a little additional thought. I don't understand the situation well enough to make a more specific suggestion myself.

Copy link
Collaborator

@netsettler netsettler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are not blocking. I'm OK with this.

Copy link
Collaborator

@netsettler netsettler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked one of the three places I had concerns about circularity. Maybe there is no issue but I wanted to raise it just in case. Is it the case that the thing being merged has to already be a DAG and so can't have a circular reference?

Related to this, is there a consistency problem?

Other than just wanting you to double-check this, I'm OK with this.

@@ -104,6 +105,8 @@ def schema_mapping(field, schema, top_level=False, from_array=False):
TODO: rename 'lower_case_sort' to 'lowercase' and adjust search code
"""
ignored(top_level) # TODO: maybe wants to be used below, but isn't yet?
if '$merge' in schema:
schema = load_schema(schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk of circularity here? Is something checking that already?

@willronchetti willronchetti merged commit 9df20a9 into master Aug 4, 2023
3 checks passed
@willronchetti willronchetti deleted the wrr_schema branch August 4, 2023 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants