Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise makeresource commands #1309

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open

Parallelise makeresource commands #1309

wants to merge 13 commits into from

Conversation

eAlasdair
Copy link
Contributor

@eAlasdair eAlasdair commented Jan 21, 2020

The problem:

  • The problem this intends to help is that the makeresources command, which generates PDFs (Printables) for resources, takes so long that the task needs to be split into 11 different jobs on Travis.
    • Further overhead is created with each new job; the system needs to be built and started on each job before the required bit of work can be done.

Main Changes:

  • Partially parallelise the makeresources command
  • Switch to update_lite instead of update for making resources in dev_deploy scripts that use the makeresources command
    (prod_deploy is unchanged presently)

Secondary Changes:

Effects:

On my PC (6c12t) using 6 threads

  • ~30% reduction in the time to complete the full makeresources command (details below)
  • ~50% reduction in the time to complete the full makeresourcethumbnails command
  • ~35% reduction in the time to complete the full update command
  • I don't see any changes to the final result

image

  • Six resources threw an sqlite3.ProgrammingError when attempting to run over multiple threads. I couldn't figure why these 6 had an sqlite3 object that the others didn't have, so I added code that reverted to processing in series when a problem was detected. Further improvements should be possible if we can get a nicer solution.
  • The Binary to Alphabet resource is the only one that is successfully parallelised yet consistently takes longer to generate.

Travis

Sadly I have no way to check in advance how any of these changes will affect our actual deployment

  • Using update_lite instead of update is a ~90% reduction in time to complete for (as far as I can tell) the same result
  • I have changed nothing else with the deployment settings; I want to see how these changes affect the existing jobs before attempting to rearrange them
  • From what I can find, Travis only uses 2 cores per job, whereas my PC has 6. I'm concerned that this will result in a significantly smaller improvement, and I may need to reduce the number of assigned threads. The performance on Travis will also depend on its I/O capacity.

Known Issues

  • When Travis runs tests on this branch (continuous-integration/travis-ci/pr, continuous-integration/travis-ci/push), it can get a 'file already exists' error when running the makeresources part of the tests.
    My theory is that it's related to Modify tests to not write to build/static/staticfiles directories #700 – Travis runs these test jobs in parallel so it might be possible to (for example) have the management tests fail when creating a file because the test backwards had already created the file and not yet removed it. This would explain why I haven't seen any test fail when run in series. However, entirely different builds shouldn't interfere with each other so I don't know for sure

  • None of the tests replicate the issue that prevents multithreading when creating resources/thumbnails, so the exception handlers don't get hit in codecov

@eAlasdair eAlasdair added suggestion content: printables Related to the printable resources infrastructure Related to the infrastructure running this software labels Jan 21, 2020
@eAlasdair eAlasdair self-assigned this Jan 21, 2020
@eAlasdair eAlasdair changed the title Parallelise makeresource commands DRAFT: Parallelise makeresource commands Jan 21, 2020
@eAlasdair eAlasdair marked this pull request as draft August 5, 2020 00:49
@eAlasdair eAlasdair changed the title DRAFT: Parallelise makeresource commands Parallelise makeresource commands Aug 5, 2020
@eAlasdair eAlasdair requested a review from courtneycb October 8, 2020 05:34
@eAlasdair eAlasdair marked this pull request as ready for review October 8, 2020 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content: printables Related to the printable resources infrastructure Related to the infrastructure running this software suggestion
Development

Successfully merging this pull request may close these issues.

Parallelise makeresourcethumbnails command
1 participant