Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AvroTurf::SchemaStore do not follow Avro specification when loading nested schemas #186

Closed
piotaixr opened this issue Jul 3, 2023 · 5 comments · Fixed by #203
Closed

Comments

@piotaixr
Copy link
Contributor

piotaixr commented Jul 3, 2023

The avro specification states in https://avro.apache.org/docs/1.10.2/spec.html#names:

In record, enum and fixed definitions, the fullname is determined in one of the following ways:

- A name and namespace are both specified. For example, one might use "name": "X", "namespace": "org.foo" to indicate the fullname org.foo.X.
- A fullname is specified. If the name specified contains a dot, then it is assumed to be a fullname, and any namespace also specified is ignored. For example, use "name": "org.foo.X" to indicate the fullname org.foo.X.
- A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol. For example, if "name": "X" is specified, and this occurs within a field of the record definition of org.foo.Y, then the fullname is org.foo.X. If there is no enclosing namespace then the null namespace is used.
References to previously defined names are as in the latter two cases above: if they contain a dot they are a fullname, if they do not contain a dot, the namespace is the namespace of the enclosing definition.

This means that, if we have the following schema file:

foo/bar.avsc

{
  "type": "array",
  "namespace": "foo",
  "name": "bar",
  "items": "another_schema"
}

... then, the another_schema schema MUST be in the foo namespace as A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol

Currently, the AvroTurf::SchemaStore will try to load another_schema.avsc in the null namespace (at the root of the provided path) instead of in the foofolder.

I did not find a workaround for this as:

  • If the another_schema.avsc file is at the root, Avro do not find the schema in the provided cache, which causes a double load, and Avro raises a "already in use" error
  • If the another_schema.avsc file is in the foo folder like it should be according to spec, then the SchemaStore do not find it as the file name to load is extracted from the exception message, and this message do not contains the fullname, but only the name of the schema that is attempted to be loaded.

The only solution is to explicitely provide the namespace for every nested schema reference that we expect the schema store to load, even if the specification allows us not to.

@dasch
Copy link
Owner

dasch commented Jul 5, 2023

I guess there’s nothing to do then?

@piotaixr
Copy link
Contributor Author

piotaixr commented Jul 6, 2023

There is. I just opened https://issues.apache.org/jira/browse/AVRO-3790

@piotaixr
Copy link
Contributor Author

PR opened in the Avro repo: apache/avro#2409

@github-actions
Copy link

Stale issue message

@piotaixr
Copy link
Contributor Author

piotaixr commented Oct 3, 2023

Avro 1.11.3 has been released, I can now create a PR to fix this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants