Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close #921 (Flambe Dependency Fix) #922

Merged
merged 8 commits into from
Jul 18, 2024

Conversation

raissinging
Copy link
Contributor

@raissinging raissinging commented Jun 5, 2024

closes #921

There is a dependency issue in the recently merged flambe dataset as it imports the bigbio schemas from the wrong file. I think this PR should fix that issue by importing the schemas using .bigbiohub instead of from bigbio.utils (a file that is in a different folder and which I think causes the ImportError).

Checkbox

  • Confirm that this PR is linked to the dataset issue.
  • Create the dataloader script hub/hub_repos/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • Confirm dataloader script works with datasets.load_dataset function.
  • Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local.
  • If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

@phlobo phlobo merged commit f614bae into bigscience-workshop:main Jul 18, 2024
@phlobo
Copy link
Collaborator

phlobo commented Jul 18, 2024

@raissinging Thank you for contributing this dataset! Unrelated to this issue, I noticed that your dataset has NER labels, but does not use the bigbio_kb schema. Do you think it would be possible to convert your IOB tags to character offsets and use the bigbio_kb schema instead?

phlobo pushed a commit to davidkartchner/biomedical that referenced this pull request Oct 21, 2024
…kshop#922)

* inital working draft of flambe

* added ned data

* changed comments

* format

* fixed a bug

* added abstracts with bigbio text schema

* fixed bigbio dependency bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flambe Dependency Bug
2 participants