Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bibtex references in the docstrings to be shown in the README #855

Merged
merged 9 commits into from
Aug 6, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Aug 5, 2024

Description

This PR Includes a new section in the docstrings called "Citations" that can be parsed to obtain the bibtex citation/references if informed. For example:

class DummyStep(Step):
    """This is a dummy function.

    And this is still a dummy function, but with a longer description.

    References:
        - [Argilla](https://argilla.io)

    Citations:

        ```
        @misc{xu2024magpie,
            title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
            author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
            year={2024},
            eprint={2406.08464},
            archivePrefix={arXiv},
            primaryClass={cs.CL}
        }
        ```

        ```
        @misc{liu2024apigenautomatedpipelinegenerating,
            title={APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets}, 
            author={Zuxin Liu and Thai Hoang and Jianguo Zhang and Ming Zhu and Tian Lan and Shirley Kokane and Juntao Tan and Weiran Yao and Zhiwei Liu and Yihao Feng and Rithesh Murthy and Liangwei Yang and Silvio Savarese and Juan Carlos Niebles and Huan Wang and Shelby Heinecke and Caiming Xiong},
            year={2024},
            eprint={2406.18518},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2406.18518}, 
        }
        ```
    """
    ...

If "Citations" section is informed, it will be inserted in the Distiset automatically. If they are not informed but the references include a URL pointing to arxiv, we will try to obtain it by making a request to https://arxiv.org.

References in the docs:
image
And as part of the tags:
image

Dummy repo to see an example

@plaguss plaguss self-assigned this Aug 5, 2024
@plaguss plaguss added enhancement New feature or request documentation Improvements or additions to documentation labels Aug 5, 2024
Copy link

github-actions bot commented Aug 5, 2024

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-855/

Copy link

codspeed-hq bot commented Aug 5, 2024

CodSpeed Performance Report

Merging #855 will not alter performance

Comparing citation-in-readme (27711c3) with develop (4e1f2bc)

Summary

✅ 1 untouched benchmarks

@plaguss plaguss marked this pull request as ready for review August 5, 2024 16:43
@plaguss plaguss linked an issue Aug 6, 2024 that may be closed by this pull request
Copy link
Member

@gabrielmbmb gabrielmbmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

docs/sections/how_to_guides/advanced/distiset.md Outdated Show resolved Hide resolved
src/distilabel/distiset.py Outdated Show resolved Hide resolved
@plaguss plaguss added this to the 1.3.0 milestone Aug 6, 2024
@plaguss plaguss merged commit 44bd633 into develop Aug 6, 2024
6 of 7 checks passed
@plaguss plaguss deleted the citation-in-readme branch August 6, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Include papers citations automatically in dataset README.md
2 participants