Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline-citation questions #1

Open
wyzhhhh opened this issue Sep 2, 2024 · 1 comment
Open

Inline-citation questions #1

wyzhhhh opened this issue Sep 2, 2024 · 1 comment

Comments

@wyzhhhh
Copy link

wyzhhhh commented Sep 2, 2024

How does the author get the inline information from the S2ORC dataset?

@anirudhajith
Copy link
Collaborator

Hi @wyzhhhh,

The full S2ORC dataset releases include an "annotations" field along with the paper data. This field contains information about the indices corresponding to various parts (eg. title, abstract, author names, individual paragraphs, etc.) of the paper's plaintext.

Here's an illustration of the S2ORC schema:
Screenshot 2023-10-25 at 12 17 28 AM

We used the indices listed under the "bibref" annotations to isolate the positions of inline citations. These annotations also usually included a "matched_paper_id" field that we could use to match an inline citation from a source paper to a cited target paper within the S2ORC dataset.

I hope this answers your question. Let us know if you have any more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants