-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error reading JSON-LD from text IO stream #1484
Comments
I've created a test case for this:
And also a fix that seems to work (but see below): In file:
I have a clean version of all tests passing before I applied my changes. In particular, the sparql service tests were hanging. And a number of other unrelated tests appear to be failing. My changes and tests have been based on the Anyway, as far as I can tell, the remaining failing tests are not related to my changes. |
I'm lucky enough to be able to run RDFLib master tests locally and get an all-clear. I've transcribed your fix and test into a separate branch, validated that the tests all pass and, pro tem pushed it up to a spare org where GitHub Actions can pick up Iwan Aucamp's extremely useful “validate.yaml” workflow that does the biz, running the tests over a matrix of platforms/Python versions. Confirmatory results are here Submitted PR is here Thanks for the fix. |
Thanks! In case it helps:
(This was catching an error that was otherwise showing up in the "roundtrip" tests.) Further comment: I think the logic around handling different source data has become a bit diffused - it seems some is in the xml reader classes imported from SAX, some in |
I just noticed my fix had some code inconsistency left over from my experiments. The following cleanup still passes my new tests locally (changes to the lines following
|
Hi, we are getting a similar error as @gklyne We have been trying to parse simple JSON-LD with http://schema.org as context (the most used context that can be found all other the web...), but it has been quite challenging due to encoding errors that seems to belong to the 90's... Here is the basic JSON-LD we want to load in RDFLib: {
"@context": "https://schema.org",
"@type": "Dataset",
"name": "ECJ case law text similarity analysis",
"description": "results from a study to analyse how closely the textual similarity of ECJ cases resembles the citation network of the cases.",
"version": "v2.0",
"url": "https://doi.org/10.5281/zenodo.4228652",
"license": "https://www.gnu.org/licenses/agpl-3.0.txt"
} Here is the error we got: File "/usr/local/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 377, in _prep_sources
new_ctx = self._fetch_context(
File "/usr/local/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 409, in _fetch_context
source = source_to_json(source_url)
File "/usr/local/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/util.py", line 35, in source_to_json
return json.load(StringIO(stream.read().decode("utf-8")))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte Here is the fix that we came up with ( from pyld import jsonld
import json
## TODO: quickfix to remove, should be fixed in the rdflib releases after 6.0.2
if '@context' in nanopub_rdf.keys() and (str(nanopub_rdf['@context']).startswith('http://schema.org') or str(nanopub_rdf['@context']).startswith('https://schema.org')):
# Regular content negotiation dont work with schema.org: https://github.com/schemaorg/schemaorg/issues/2578
nanopub_rdf['@context'] = 'https://schema.org/docs/jsonldcontext.json'
# RDFLib JSON-LD has issue with encoding: https://github.com/RDFLib/rdflib/issues/1416
nanopub_rdf = jsonld.expand(nanopub_rdf)
nanopub_rdf = json.dumps(nanopub_rdf, ensure_ascii=False) It seems to be due to RDFLib not being able to read the Schema.org JSON-LD context available at https://schema.org/docs/jsonldcontext.json I tried also with another {
"@context": {
"adms": "http://www.w3.org/ns/adms#",
"dcat": "http://www.w3.org/ns/dcat#",
"dcterms": "http://purl.org/dc/terms/",
"foaf": "http://xmlns.com/foaf/0.1/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"schema": "http://schema.org/",
"vcard": "http://www.w3.org/2006/vcard/ns#",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@id": "http://nobelprize.org/datasets/dcat#ds1",
"@type": "dcat:Dataset",
"adms:contactPoint": {
"@id": "http://nobelprize.org/contacts/n1",
"@type": "foaf:Agent",
"foaf:name": "Vincent Emonet"
}
} It would be really useful if there was some tests added to check for common real-world use of JSON-LD (e.g. using a single "http://schema.org" string as context...) Everyone is using this approach for JSON-LD to publish schema.org-related metadata all over the web... Also it would be really easy to do this: when a single context is provided, and you don't manage to get the @context at the given URL, then you just concatenate the namespace provided in the I tried with 6.0.2 and the |
I ran gklyne's tests against master, they now pass. |
Testing my application with rdflib 6.1.1 -- my previously-failing tests are now all passing. Thanks all! |
I logged an issue against rdflib-jsonld a little over 3 years ago (with a suggested fix):
RDFLib/rdflib-jsonld#55
There was also what appears to be a related issue:
RDFLib/rdflib-jsonld#91
Back then, this problem was blocking my migration to Python3. I'm now revisiting that project, and no longer have the option to stick with Python 2, and I'm seeing the same problem with rdflib==6.0.1, which of course now incorporates rdflib-jsonld. Are there any plans to incorporate a fix for this issue?
(I'm not reading from a file, but a stream generated by another software component, so I don't have the option to just open the file in binary mode.)
The error I'm seeing is this:
(Apologies if this is already recorded as an issue: I did look at existing issues, but couldn't find a match.)
The text was updated successfully, but these errors were encountered: