Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnl] Redesign frog component in pipeline for performance #13

Closed
proycon opened this issue Feb 5, 2018 · 1 comment
Closed

[dbnl] Redesign frog component in pipeline for performance #13

proycon opened this issue Feb 5, 2018 · 1 comment

Comments

@proycon
Copy link
Member

proycon commented Feb 5, 2018

The DBNL pipeline starts a single Frog process with many threads, this does not work as expected; Frog is way too slow (LanguageMachines/frog#45) and its parallellisation does not work as expected and in fact slows things down.

The simple alternative would be to run a Frog instance for every document, but this will bring initialisation times back into the equation so would not be ideal.

Instead: Redesign the dbnl pipeline to cut the input batch into N batches and start N frogs in parallel on each of these batches. This is not the most ideal form of parallellisation (the batches won't finish at the same time), but probably the best and most realistic choice now.

@proycon
Copy link
Member Author

proycon commented Feb 5, 2018

oops, now I have M batches of N (m >> n) instead of N batches...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant