Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency tesseract.js to v5 #30

Merged
merged 1 commit into from
Sep 30, 2023

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Sep 28, 2023

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
tesseract.js ^4.1.0 -> ^5.0.0 age adoption passing confidence

Release Notes

naptha/tesseract.js (tesseract.js)

v5.0.0

Compare Source

What's Changed

Major New Features

  1. Significantly smaller file sizes
    1. 54% smaller file sizes for English, 73% smaller for Chinese (see #​806 for details)
    2. This results in a ~50% decrease in runtime for first-time users (who do not yet have the data downloaded/cached)
  2. Significantly lower memory usage
    1. Worker memory utilization in the web benchmark is reduced from 311 MB to 164 MB (47% reduction)
    2. The lower memory footprint makes it feasible to use more workers, significantly improving performance for projects that utilize schedulers for parallel processing
  3. Compatible with iOS 17 (using default settings)
    1. iOS 17 broke compatibility with Tesseract.js v4--upgrading to v5 should resolve
      1. See discussion section below for details

Breaking Changes Impacting Many Users

  1. createWorker arguments changed
    1. Setting non-default language and OEM now happens in createWorker
      1. E.g. createWorker("chi_sim", 1)
  2. worker.initialize and worker.loadLanguage functions now do nothing and can be deleted from code
    1. Loading the language and initialization now occurs in createWorker
    2. Workers can be re-initialized with different settings using worker.reinitialize

In other words, code should be modified from this:

const worker = await Tesseract.createWorker();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const ret = await worker.recognize(file);

To this:

const worker = await Tesseract.createWorker("eng");
const ret = await worker.recognize(file);

Breaking Changes Impacting Fewer Users

  1. Users who manually set corePath will need to update the contents of their corePath directory
    1. corePath should point to a directory that contains all 4 of the files below from Tesseract.js-core v5:
      1. tesseract-core.wasm.js
      2. tesseract-core-simd.wasm.js
      3. tesseract-core-lstm.wasm.js
      4. tesseract-core-simd-lstm.wasm.js
    2. Tesseract.js will automatically select the correct version to use
  2. worker.detect function disabled by default
    1. Orientation + script detection is a function of the Legacy model only, which is no longer included by default
    2. To enable, set arguments legacyCore: true and legacyLang: true in createWorker options
      1. E.g. Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true});
  3. Language of progress logs standardized
    1. This should only impact users who parse status logs (e.g. to update a loading bar)

Non-Breaking Changes

  1. Language data loaded from jsdelivr by default (rather than GitHub pages)
    1. This should result in improved performance and uptime
  2. Separate "development" build (that produced tesseract.dev.js and worker.dev.js removed
  3. Documentation and examples were modified to prevent new users from using Tesseract.recognize and Tesseract.detect
    1. Users who already use these functions are encouraged to modify their code to use worker.recognize and worker.detect instead

Full Changelog: naptha/tesseract.js@v4.1.3...v5.0.0

v4.1.4

Compare Source

What's Changed

  • Restored compatibility with certain versions of Node.js v14

Full Changelog: naptha/tesseract.js@v4.1.3...v4.1.4

v4.1.3

Compare Source

What's Changed

New Contributors

Full Changelog: naptha/tesseract.js@v4.1.2...v4.1.3

v4.1.2

Compare Source

What's Changed

  • Fixed bug causing excessive memory use when using FS + writeFile function (#​812)
  • Fixed bug where setting output option debug: true was forcing recognition to be run (#​788)
  • Added warning message when setParameters is used to set options that can only be set during initialize (#​816)
  • Minor edits to reduce memory use (#​815)
  • Minor changes to documentation, types, and example code (#​575, #​791, #​803, #​805, #​810, #​817)

New Contributors

Full Changelog: naptha/tesseract.js@v4.1.1...v4.1.2

v4.1.1

Compare Source

What's Changed

  • Fixed detection of image orientation metadata (#​783)
    • Allows Tesseract.js to work with images taken on iOS devices
  • Minor changes to documentation and types (#​781, #​782, #​778)

New Contributors

Full Changelog: naptha/tesseract.js@v4.1.0...v4.1.1


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@AlexProgrammerDE AlexProgrammerDE merged commit 6dd59af into main Sep 30, 2023
2 checks passed
@renovate renovate bot deleted the renovate/tesseract.js-5.x branch September 30, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant