Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bump version to 0.10.6 #268

Merged
merged 3 commits into from
Feb 24, 2024
Merged

chore: bump version to 0.10.6 #268

merged 3 commits into from
Feb 24, 2024

Conversation

percevalw
Copy link
Member

Changelog

v0.10.6

Added

  • Added batch_by, split_into_batches_after, sort_chunks, chunk_size, disable_implicit_parallelism parameters to processing (simple and multiprocessing) backends to improve performance
    and memory usage. Sorting chunks can improve yield up to twice the speed in some cases.
  • The deep learning cache mechanism now supports multitask models with weight sharing in multiprocessing mode.
  • Added max_tokens_per_device="auto" parameter to eds.transformer to estimate memory usage and automatically split the input into chunks that fit into the GPU.

Changed

  • Improved speed and memory usage of the eds.text_cnn pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.
  • Deprecate the converters' (especially for BRAT/Standoff data) bool_attributes
    parameter in favor of general default_attributes. This new mapping describes how to
    set attributes on spans for which no attribute value was found in the input format.
    This is especially useful for negation, or frequent attributes values (e.g. "negated"
    is often False, "temporal" is often "present"), that annotators may not want to
    annotate every time.
  • Default eds.ner_crf window is now set to 40 and stride set to 20, as it doesn't
    affect throughput (compared to before, window set to 20) and improves accuracy.
  • New default overlap_policy='merge' option and parameter renaming in
    eds.span_context_getter (which replaces eds.span_sentence_getter)

Fixed

  • Improved error handling in multiprocessing backend (e.g., no more deadlock)
  • Various improvements to the data processing related documentation pages
  • Begin of sentence / end of sentence transitions of the eds.ner_crf component are now
    disabled when windows are used (e.g., neither window=1 equivalent to softmax and
    window=0equivalent to default full sequence Viterbi decoding)
  • eds tokenizer nows inherits from spacy.Tokenizer to avoid typing errors
  • Only match 'ne' negation pattern when not part of another word to avoid false positives cases like u[ne] cure de 10 jours
  • Disabled pipes are now correctly ignored in the Pipeline.preprocess method
  • Add "eventuel*" patterns to eds.hyphothesis

Checklist

  • If this PR is a bug fix, the bug is documented in the test suite.
  • Changes were documented in the changelog (pending section).
  • If necessary, changes were made to the documentation (eg new pipeline).

Copy link

codecov bot commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.00%. Comparing base (6bfa4cc) to head (a859540).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #268   +/-   ##
=======================================
  Coverage   97.00%   97.00%           
=======================================
  Files         255      255           
  Lines        8618     8618           
=======================================
  Hits         8360     8360           
  Misses        258      258           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@percevalw percevalw force-pushed the v0.10.6 branch 2 times, most recently from 4224411 to 27f19ce Compare February 24, 2024 01:34
@percevalw percevalw merged commit 69d9966 into master Feb 24, 2024
14 checks passed
@percevalw percevalw deleted the v0.10.6 branch February 24, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant