-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
changes to how tokenizer hashes are handled
- directory in which we look for the hash file can be overriden with `set_tokenizer_hashes_path` this could be useful for situations where writing to inside the package installation dir is not possible. I tried making it a relative path to `data/MazeTokenizerModular_hashes.npz` but this broke so many things and is honestly a worse idea. - `MazeTokenizerModular.__hash__` now calls `MazeTokenizerModular.hash_int()`, which is also used in `MazeTokenizerModular.hash_b64` which should be more concise for filenames. this is used in `tests/all_tokenizers/test_all_tokenizers.py` - option for more informative assert mode in `is_tested_tokenizer` (I broke some tests, was useful for debugging) - `demo_mazetokenizermodular.ipynb` now also asserts that length of loaded hashes is equal to length of created tokenizers
- Loading branch information
Showing
3 changed files
with
144 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.