Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create version with newline in the description and abstract fails. #156

Open
white-gecko opened this issue Jan 3, 2024 · 11 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@white-gecko
Copy link
Contributor

Creating the following version fails:

{
  "@context": "https://databus.coypu.org/res/context.jsonld",
  "@graph": [
    {
      "@id": "https://databus.coypu.org/narndt/coypu",
      "@type": "Group",
      "title": "CoyPu"
    },
    {
      "@id": "https://databus.coypu.org/narndt/coypu/countries",
      "@type": "Artifact",
      "title": "Countries",
      "abstract": "Counties and regions",
      "description": "Counties and regions"
    },
    {
      "@type": [
        "Version",
        "Dataset"
      ],
      "@id": "https://databus.coypu.org/narndt/coypu/countries/2023-09-18T122214Z",
      "hasVersion": "2023-09-18T122214Z",
      "title": "Countries",
      "abstract": "Countries\n2023-09-18T12:22:14Z",
      "description": "Countries\n2023-09-18T12:22:14Z",
      "license": "https://dalicc.net/licenselibrary/Cc010Universal",
      "wasDerivedFrom": "https://metadata.coypu.org/dataset/wikidata-distribution\nWikidata Query Service\nhttps://query.wikidata.org/",
      "distribution": [
        {
          "@type": "Part",
          "formatExtension": "ttl",
          "compression": "none",
          "downloadURL": "https://databus.coypu.org/dav/narndt/coypu/countries/2023-09-18T122214Z/countries_freqency=static.ttl",
          "dcv:frequency": "static"
        }
      ]
    }
  ]
}

with the output:

PROTECT Authenticated request by narndt: /api/publish?fetch-file-properties=true&log-level=debug
GET /res/context.jsonld 200 1.805 ms - 3490
GET /res/context.jsonld 200 1.805 ms - 3490
Found 1 group graphs.
Processing group <https://databus.coypu.org/narndt/coypu>
2 triples selected via construct query.
Input has been processed by the auto-completer
SHACL validation successful
Context has been resubstituted with <https://databus.coypu.org/res/context.jsonld>
Saving group <https://databus.coypu.org/narndt/coypu> to narndt:coypu/group.jsonld
Found 1 artifact graphs.
Processing artifact <https://databus.coypu.org/narndt/coypu/countries>
4 triples selected via construct query.
Input has been processed by the auto-completer
SHACL validation successful
Context has been resubstituted with <https://databus.coypu.org/res/context.jsonld>
Saving artifact <https://databus.coypu.org/narndt/coypu/countries> to narndt:coypu/countries/artifact.jsonld
Found 1 version graphs.
Processing version <https://databus.coypu.org/narndt/coypu/countries/2023-09-18T122214Z>
Detected CV-graphs

/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:6964
      throw new JsonLdError(
      ^
JsonLdError [jsonld.ParseError]: Error while parsing N-Quads; invalid quad.
    at _parseNQuads (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:6964:13)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:4236:20
    at work (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3932:14)
    at Normalize.doWork (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3944:5)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3993:10
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3982:9
    at work (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3932:14)
    at Normalize.doWork (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3944:5)
    at iterate (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3981:19)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3935:9
    at iterate (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3985:5)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:4223:13
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3982:9
    at work (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3932:14)
    at Normalize.doWork (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3944:5)
    at iterate (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3981:19)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:4223:13
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3982:9
    at work (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3932:14)
    at Normalize.doWork (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3944:5)
    at iterate (/databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:3981:19)
    at /databus/server/node_modules/rdfstore/node_modules/jsonld/js/jsonld.js:4223:13 {
  details: { line: 8 }
}

When I remove all newlines \n the parsing is successful, but fails at a later stage.

@JJ-Author JJ-Author added the bug Something isn't working label Jan 3, 2024
@JJ-Author
Copy link
Collaborator

we have to think whether improves quality to disallow newlines in abstract because it is intended to be short and concise. but could be annoying for uploaders.
however for descriptions it should definitely be supported.

@manonthegithub
Copy link
Collaborator

hey @white-gecko, is it the input which also causes an error in gstore (the later stage), this one "virtuoso.jdbc4.VirtuosoException: SQ074: Line 38: SP030: SPARQL compiler, line 5: syntax error at '<' before 'https:'" when you remove newlines from the version above?

@white-gecko
Copy link
Contributor Author

Yes

@manonthegithub
Copy link
Collaborator

manonthegithub commented Jan 4, 2024

ref #158

@white-gecko
Copy link
Contributor Author

white-gecko commented Jan 4, 2024

No this is not correct. The problem described here is different.

The problem described in this issue is that the multi line literals created from one of the two fields abstract or description:

      "abstract": "Countries\n2023-09-18T12:22:14Z",
      "description": "Countries\n2023-09-18T12:22:14Z",

is not represented correctly in N-Triples, i.e. the newlines are note encoded as \n in the RDF literal but are represented as actual newlines, which is not allowed in N-Triples.

@manonthegithub manonthegithub reopened this Jan 4, 2024
@holycrab13 holycrab13 self-assigned this Jan 5, 2024
@holycrab13
Copy link
Contributor

I was able to reproduce the bug, however the issue is something else. abstract/description can have newlines, the derivedFrom field having newlines seems to be the issue here.

@holycrab13
Copy link
Contributor

holycrab13 commented Feb 7, 2024

Since this is a derivedFrom issue, this is linked to #158 (you were right @manonthegithub )

The current model specifies the value to be an uri:
https://dbpedia.gitbook.io/databus/model/metadata/version#wasderivedfrom

Required fixes:

  • improve server side error handling, the input manages to crash a cluster node somehow.
  • improve UI, the current textarea hints towards text input. Make it an input, possibly with URI regex checking

@white-gecko
Copy link
Contributor Author

Are you sure it works with newlines in abstract and description?

@holycrab13
Copy link
Contributor

Yes, it does work

@holycrab13
Copy link
Contributor

holycrab13 commented Feb 7, 2024

A newline character in any URI will crash the current cluster node. The error happens in an async call in a third party library and can apparently not be caught within the Databus backend.

Currently, the sequence of processing inputs is:

  • Construct Query to select relevant triples in input
  • Auto-Complete
  • SHACL

I hoped that SHACL with nodekind:IRI would catch the error and changed the sequence to

  • Auto-Complete
  • SHACL
  • Construct

This did not help but could be a solution if we specify a regex for each nodekind:IRI restriction that excludes any newlines.

Alternatively, I tried shuffling some function calls around in the Construct Query module. I converted the JSONLD input to quads before inserting into the in-memory-store. This process drops any URIs that are sketchy with a warning. This warning is about the URI not being absolute though.

{
  event: {
    type: [ 'JsonLdEvent' ],
    code: 'relative @id reference',
    level: 'warning',
    message: 'Relative @id reference found.',
    details: {
      id: 'https://metadata.coypu.org/dataset/wikidata-distribution\n' +
        'Wikidata Query Service\n' +
        'https://query.wikidata.org/',
      expandedId: 'https://metadata.coypu.org/dataset/wikidata-distribution\n' +
        'Wikidata Query Service\n' +
        'https://query.wikidata.org/'
    }
  },
  next: [Function: next]
}

This is from the latest jsonld js code that also powers the JSON-LD playground.

When bypassing the construct query issue, there will finally be a correct error message returned by Jena from the Gstore.

Saving dataset to janfo:coypu/countries/2023-09-18T122214Z/dataid.jsonld
StatusCodeError: 400 - {"message":"Wrong input data. SQ074: Line 22: syntax error. Error saving data, potentially caused by: \nBad IRI: <https://metadata.coypu.org/dataset/wikidata-distribution\nWikidata Query Service\nhttps://query.wikidata.org/> Spaces are not legal in URIs/IRIs."}

Fixing this in the backend cleanly turns out to be a bit tricky. It feels bad that the input passes all server side checks and then gets rejected by the database with the correct error.

I would be possible to add a small validator module that goes over all "@id" fields and checks the values for newlines.

@holycrab13
Copy link
Contributor

will be fixed by doing earlier SHACL validation in the input processing, related to #167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants