-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Frog options available in PiCCL just as in Frog web app #25
Comments
Yeah, that would be possible. |
+1! It would be much appreciated if at least the dependency parser could be disabled. Running Frog with all options enabled on a book with several hundreds of pages requires an immense amount of memory to the point of being inadvisable in production environments. Currently a Piccl job that includes Ticcl and Frog runs out of memory and crashes when offered a pre-OCR'ed pdf of the book Max Havelaar (downloaded from Google books in case you want to reproduce this yourself). Monitoring the memory usage of the machine during this job reveals that Ticcl does not use a lot of memory at all, but that Frog ends up eating all 12 gigabytes of memory the machine has plus the 800 megabytes of available swap space shortly before it crashes. I have attached a memory usage graph. The axis labels are in Dutch (sorry for that). The blue line represents the machine's total memory usage, minus the idle-state offset (determined as the minimum memory usage over the measured interval). The orange line represents the swap usage. The vertical red lines mark the start of the job, the transition from Ticcl to Frog and (less visible) the moment where the job crashes according to the log file, respectively. |
Hmm, I thought I had already disabled the dependency parser by default disabled in the current version, but indeed it seems not to be the case, I'll implement this right away then. Thanks for the graph, that's quite insightful. Even without dependency parser, Frog remains a memory-based system so memory-usage will be on the higher end. For our purposes I consider 12GB quite a low amount of memory and wouldn't suggest a really system with less than 32GB. (our production server has 512GB, although shared with all other services). I think the speed could be improved here as well (on a proper multicore system), we might have a reoccurrence of #13 here, I'll see if I can improve the pipeline a bit as it wasn't optimized yet here. |
Wow that's a fast response, thanks! From Frog's help text I think it should be as simple as passing |
Yes, indeed, it's a matter of passing a simple option, the I now implemented this in the latest development version (git master branch), but haven't tested it yet. I implemented it the other way round; users can select which modules they want to run, with a few preselected. |
I have not tested it either but looking at the diff I think it should work 🙂 I've got one small tip about Python though: dictionaries have a |
Yes, that's what I thought too but is usually when things go wrong in my experience ;) I know about the |
Hi proycon,
Right now, if one choses Frog in the PICCL workflow, it runs all it has.
Can you not replicate Frog option selection/deselection in the PICCL workflow, as you have it in the Frog web application?
Thanks!
Martin
The text was updated successfully, but these errors were encountered: