Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bump version to 0.11.0 #281

Merged
merged 1 commit into from
Mar 29, 2024
Merged

chore: bump version to 0.11.0 #281

merged 1 commit into from
Mar 29, 2024

Conversation

percevalw
Copy link
Member

Changelog

Added

  • Support for a filesystem parameter in every edsnlp.data.read_* and edsnlp.data.write_* functions

  • Pipes of a pipeline are now easily accessible with nlp.pipes.xxx instead of nlp.get_pipe("xxx")

  • Support builtin Span attributes in converters span_attributes parameter, e.g.

    import edsnlp
    
    nlp = ...
    nlp.add_pipe("eds.sentences")
    
    data = edsnlp.data.from_xxx(...)
    data = data.map_pipeline(nlp)
    data.to_pandas(converters={"ents": {"span_attributes": ["sent.text", "start", "end"]}})
  • Support assigning Brat AnnotatorNotes as span attributes: edsnlp.data.read_standoff(..., notes_as_span_attribute="cui")

  • Support for mapping full batches in edsnlp.processing pipelines with map_batches lazy collection method:

    import edsnlp
    
    data = edsnlp.data.from_xxx(...)
    data = data.map_batches(lambda batch: do_something(batch))
    data.to_pandas()
  • New data.map_gpu method to map a deep learning operation on some data and take advantage of edsnlp multi-gpu inference capabilities

  • Added average precision computation in edsnlp span_classification scorer

  • You can now add pipes to your pipeline by instantiating them directly, which comes with many advantages, such as auto-completion, introspection and type checking !

    import edsnlp, edsnlp.pipes as eds
    
    nlp = edsnlp.blank("eds")
    nlp.add_pipe(eds.sentences())
    # instead of nlp.add_pipe("eds.sentences")

    The previous way of adding pipes is still supported.

  • New eds.span_linker deep-learning component to match entities with their concepts in a knowledge base, in synonym-similarity or concept-similarity mode.

Changed

  • nlp.preprocess_many now uses lazy collections to enable parallel processing
  • ⚠️ Breaking change. Improved and simplified eds.span_qualifier: we didn't support combination groups before, so this feature was scrapped for now. We now also support splitting values of a single qualifier between different span labels.
  • Optimized edsnlp.data batching, especially for large batch sizes (removed a quadratic loop)
  • ⚠️ Breaking change. By default, the name of components added to a pipeline is now the default name defined in their class __init__ signature. For most components of EDS-NLP, this will change the name from "eds.xxx" to "xxx".

Fixed

  • Flatten list outputs (such as "ents" converter) when iterating: nlp.map(data).to_iterable("ents") is now a list of entities, and not a list of lists of entities
  • Allow span pooler to choose between multiple base embedding spans (as likely produced by eds.transformer) by sorting them by Dice overlap score.
  • EDS-NLP does not raise an error anymore when saving a model to an already existing, but empty directory

Copy link

codecov bot commented Mar 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.43%. Comparing base (d5dc0d8) to head (5ad4784).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #281   +/-   ##
=======================================
  Coverage   97.43%   97.43%           
=======================================
  Files         262      262           
  Lines        9090     9090           
=======================================
  Hits         8857     8857           
  Misses        233      233           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@percevalw percevalw merged commit fdae338 into master Mar 29, 2024
14 checks passed
@percevalw percevalw deleted the v0.11.0 branch March 29, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant