epic: dictionary-based word-breakers 🔬 #12142

mcdurdin · 2024-08-09T05:45:59Z

No description provided.

…eaking

Only wordbreaks anything AFTER the last space / ZWNJ. Doesn't bother with anything before it.

…ontext

…-BMP)

… mocked-fixture

…ntext

…use Tries

… from unmatched chars in path

Addresses #10973 (comment)

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

…/models/wordbreakers/fuse-dict-unmatched-chars

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

…common/models/wordbreakers/fuse-dict-unmatched-chars

keymanapp-test-bot · 2024-08-09T05:46:04Z

User Test Results

Test specification and instructions

ERROR: user tests have not yet been defined

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 18.0.131 (0.12142.12197)
Keyboards
- Test Keyboards
Web
- KeymanWeb Test Home
Windows

…-breaker chore: merge master into dict-breaker

…nto feat/common/models/wordbreakers/dict-breaker-start

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

jahorton · 2024-08-23T03:14:09Z

Jotting down notes of lingering ideas before this goes on pause for a while:

Once we build a model for the user-dictionary data, we may wish to aim to 'hash' the source (or similar) and cache it. Of course, the data is subject to change as the user adds new contacts, etc, but if our 'hash' can indicate that there has been no change to the source data, there's no need to rebuild the user-dictionary model.

Potential design paths:

rig up the model compiler within its own WebView or a separate worker; have the host app compile the user-dict model, then pass that in.
- We don't want the operation to be 'blocking'.
build a JSON object or array that can be passed into the... compiler's thread... in order to build the user-dict model.
- "compiler's thread": it could be the predictive-text worker, a separate worker, or even the main thread of a separate WebView. We haven't explicitly made a design-decision here yet.
- If it's essentially a JSON-encoding of what would be a .tsv file, the JSON parse should be relatively simple and straightforward.
- Since the model compiler is in TS, compiled down to JS... we do have to solve the issue of data transfer in one manner or other.

jahorton · 2024-08-23T03:26:13Z

I got to wondering if there are any "relatively simple" ways to avoid spinning up a WebView to run the model-compiler, should we decide to keep the user-dictionary compilation completely separate from the keyboard.

After a bit of searching, I found this: https://github.com/nodejs-mobile - a library for running Node-oriented JS scripts for mobile devices. That said, it'd be a new dependency.

jahorton · 2024-08-23T03:41:31Z

Other notable thoughts:

We should probably not associate a language code with user-dictionary data. That is, we collate the data once and use that with any language supporting predictive-text.

My original strategy (as of #11994) was to blend the models into a single, "traversable" model.

This would require that the standard lexical model for each language implements the LexiconTraversal interface, though - which is not strictly required for all custom models.
- We'd need an alternate strategy to support scenarios where a language-specific custom model lacks this feature.
Thinking ahead, we'd want a similar strategy to be in place once we start doing 'learning', which would adjust a model's probability data to better suit the user's actual typing patterns.

@mcdurdin previously suggested instead doing multiple correction-searches and picking the best from among their results after applying relative weighting. This would work, though it would also require support for multiple correction searches that does not yet exist.

They should likely use the same allotment for total execution time... likely requiring some form of load balancing.
We'd likely need the ability to pause and resume whichever search is currently returning 'more likely' paths at the time.

mcdurdin · 2024-08-23T11:17:53Z

we do have to solve the issue of data transfer in one manner or other.

Data transfer into the webview could be via local file: or http: request. This opens up a number of extensibility questions for KeymanWeb itself and how we could make the web-based experience consistent with the Keyman Android/iOS app experience.

Co-authored-by: Marc Durdin <[email protected]>

…ers/dict-breaker-start feat(common/models/wordbreakers): begin development of dictionary-based wordbreaking algorithm 🔬

…akers/unit-test-trie-access change(common/models/wordbreakers): allow wordbreaker tests to access TrieModel implementation 🔬

…ers/fuse-dict-unmatched-chars feat(common/models/wordbreakers): fuse adjacent unmatched characters when dictionary-breaking 🔬

…dict-breaker

…-breaker chore: merge master into dict-breaker 🔬

…dict-breaker

…-breaker chore: merge master into dict-breaker 🔬

…eaker

…-breaker chore: merge master into dict-breaker 🔬

…eaker

…-breaker chore: merge master into dict-breaker 🔬

jahorton and others added 21 commits August 9, 2024 09:40

feat(common/models/wordbreakers): starting on dictionary-based wordbr…

425a0a0

…eaking

feat(common/models/wordbreakers): actual first-pass implementation

7b456c1

Only wordbreaks anything AFTER the last space / ZWNJ. Doesn't bother with anything before it.

feat(common/models/wordbreakers): pass 2 - should now tokenize full c…

2da7b7c

…ontext

feat(common/models/wordbreakers): dict-breaker helper unit tests (BMP)

5568167

feat(common/models/wordbreakers): dict-breaker helper unit tests (non…

d40a09d

…-BMP)

feat(common/models/wordbreakers): dict-breaker unit tests with simple…

10f91e0

… mocked-fixture

fix(common/models/wordbreakers): blocks empty span output on empty co…

1d07e9f

…ntext

fix(common/models): update re base branch change

3e82434

docs(common/models): updates dict-breaker comments

56e4052

change(common/models/wordbreakers): allows wordbreaker unit tests to …

66c156e

…use Tries

feat(common/models/wordbreakers): baby's first khmer wordbreaking test

3181a4c

chore(common/models/wordbreakers): comment tweak

07f6747

feat(common/models/wordbreakers): rejoins adjacent single-point spans…

8c5e54a

… from unmatched chars in path

fix(common/models/wordbreakers): handling of penalty transitions

766992d

change(common/models): use spread operator to split on codepoints

99e1f4b

Addresses #10973 (comment)

chore(common/models/wordbreakers): Merge branch 'feat/common/models/w…

51b62e6

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

chore(common/models/wordbreakers): Merge base branch into feat/common…

295c70b

…/models/wordbreakers/fuse-dict-unmatched-chars

chore(common/models): drops unit tests for replaced func

8ae6a62

chore(common/models/wordbreakers): Merge branch 'feat/common/models/w…

258763b

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

chore(common/models/wordbreakers): Merge base branch fixes into feat/…

8942883

…common/models/wordbreakers/fuse-dict-unmatched-chars

chore: establish dictionary breakers epic

001126b

keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Aug 9, 2024

keymanapp-test-bot bot added the epic-dict-breaker label Aug 9, 2024

keymanapp-test-bot bot added this to the A18S8 milestone Aug 9, 2024

github-actions bot added common/ common/models/ common/models/wordbreakers/ labels Aug 9, 2024

darcywong00 modified the milestones: A18S8, A18S9 Aug 17, 2024

mcdurdin and others added 3 commits August 22, 2024 08:19

Merge pull request #12253 from keymanapp/chore/merge-master-into-dict…

1b3dfda

…-breaker chore: merge master into dict-breaker

chore(common/models/wordbreakers): Merge branch 'epic/dict-breaker' i…

fcbffc6

…nto feat/common/models/wordbreakers/dict-breaker-start

chore(common/models/wordbreakers): Merge branch 'feat/common/models/w…

d38ed7c

…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access

chore(common/models): Apply suggestions from code review

c33c10d

Co-authored-by: Marc Durdin <[email protected]>

mcdurdin modified the milestones: A18S9, A18S19 Aug 27, 2024

jahorton and others added 9 commits August 27, 2024 10:15

Merge pull request #12139 from keymanapp/feat/common/models/wordbreak…

9ca5d0a

…ers/dict-breaker-start feat(common/models/wordbreakers): begin development of dictionary-based wordbreaking algorithm 🔬

Merge pull request #12140 from keymanapp/change/common/models/wordbre…

b0aaf1e

…akers/unit-test-trie-access change(common/models/wordbreakers): allow wordbreaker tests to access TrieModel implementation 🔬

Merge pull request #12141 from keymanapp/feat/common/models/wordbreak…

2e7e9d7

…ers/fuse-dict-unmatched-chars feat(common/models/wordbreakers): fuse adjacent unmatched characters when dictionary-breaking 🔬

chore: Merge branch 'epic/dict-breaker' into chore/merge-master-into-…

1df99e5

…dict-breaker

chore: move dict.ts into src

cf43c0e

Merge pull request #12317 from keymanapp/chore/merge-master-into-dict…

dad7a6c

…-breaker chore: merge master into dict-breaker 🔬

chore: Merge branch 'epic/dict-breaker' into chore/merge-master-into-…

c2e80b4

…dict-breaker

chore: fixup dependency path

c4991a7

Merge pull request #12411 from keymanapp/chore/merge-master-into-dict…

367d32e

…-breaker chore: merge master into dict-breaker 🔬

github-actions bot added web/ common/ and removed common/ labels Sep 14, 2024

mcdurdin added 2 commits October 11, 2024 01:20

Merge branch 'epic/dict-breaker' into chore/merge-master-into-dict-br…

39430bf

…eaker

Merge pull request #12530 from keymanapp/chore/merge-master-into-dict…

04cab9d

…-breaker chore: merge master into dict-breaker 🔬

github-actions bot added common/ and removed common/ labels Oct 11, 2024

mcdurdin added 2 commits October 25, 2024 03:42

Merge branch 'epic/dict-breaker' into chore/merge-master-into-dict-br…

0db0b26

…eaker

Merge pull request #12575 from keymanapp/chore/merge-master-into-dict…

c3f9917

…-breaker chore: merge master into dict-breaker 🔬

github-actions bot added common/ and removed common/ labels Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: dictionary-based word-breakers 🔬 #12142

epic: dictionary-based word-breakers 🔬 #12142

mcdurdin commented Aug 9, 2024

keymanapp-test-bot bot commented Aug 9, 2024 •

edited

Loading

jahorton commented Aug 23, 2024

jahorton commented Aug 23, 2024

jahorton commented Aug 23, 2024

mcdurdin commented Aug 23, 2024

epic: dictionary-based word-breakers 🔬 #12142

Are you sure you want to change the base?

epic: dictionary-based word-breakers 🔬 #12142

Conversation

mcdurdin commented Aug 9, 2024

keymanapp-test-bot bot commented Aug 9, 2024 • edited Loading

User Test Results

Test Artifacts

jahorton commented Aug 23, 2024

jahorton commented Aug 23, 2024

jahorton commented Aug 23, 2024

mcdurdin commented Aug 23, 2024

keymanapp-test-bot bot commented Aug 9, 2024 •

edited

Loading