Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to line,column position for triples #95

Open
phillord opened this issue Jul 5, 2022 · 3 comments
Open

Access to line,column position for triples #95

phillord opened this issue Jul 5, 2022 · 3 comments

Comments

@phillord
Copy link
Contributor

phillord commented Jul 5, 2022

I am currently using the RdfXmlParser to parse OWL files. While this works well, debugging my parser or the OWL file is fairly hard because my error messages are poor. I'd like to improve this. The main issue advance would be to get some line position information from the parser as it is going. Currently, I can't see any mechanism of getting this information out from the RdfXmlParser -- the underlying quick_xml parser could at least provide access to the buffer position which would be easy enough to uncover.

I am not sure if this is a good solution or not, although it is cheap and cheerful. So I thought to ask before I sent in a PR to see if there is a better design.

@Tpt
Copy link
Collaborator

Tpt commented Jul 5, 2022

Hi! Indeed having proper error location would be amazing. Thank you! It's sad that quick-xml does not provide column/row position indeed. Using the buffer position seems also to me the least bad way to go. Feel free to submit a minimal not working PR to see if we agree on a design before spending too much time on it.

The Turtle parser has already positioning support it might be a useful reference.

@phillord
Copy link
Contributor Author

phillord commented Jul 5, 2022

I think quick-xml doesn't know the column/row information it passes through, but, yes, it's a PITA.

But the turtle approach isn't enough for me. I need access to it all the time even during normal functioning. Most of my errors come not from the RDF parsing (my files are generally valid) but because the OWL that is produced is not valid for one reason or another. As I make sense of the triples after I have parsed them all, I have no choice put to collect the buffer positions as I go and keep them alongside the triples.

@andrefs
Copy link

andrefs commented Dec 14, 2022

Hello, what about position access even when there are no errors?

I would like to know how many bytes were already read from a graph, or the start and end offset of a triple on the parse_step callback (perfect for grabbing the relevant text directly from the source file or for knowing how much of the file has been read already).

I'm mainly interested in the Turtle/NTriples parsers but I imagine this might be helpful for other parsers as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants