HuggingFaceTokenizer took 90ms to process 10^5 length text #2846
Closed
Rfank2021
started this conversation in
Development
Replies: 4 comments 5 replies
-
What's your expectation? What are you comparing with? Do you have benchmark for both python and DJL's implementation? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Don't know about the algorithm, but why not stop when already got 256 tokens? |
Beta Was this translation helpful? Give feedback.
2 replies
-
I'm able to reproduce your issue. Here is what I found:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
I created a PR to address this issue: #2857 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feel it's slow for roberta-base model, maybe it's more for batch tokenization rather than single text.
Beta Was this translation helpful? Give feedback.
All reactions