Update setup.py
to build object files in parallel if requested
#105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi again!
Since the best way to install
cython-blis
is to compile it from source to take advantage of the machine architecture. In the case of our HPC cluster, I end up re-installingcython-blis
on each node executor at the start of each job to make sure I'm using optimized code, but this takes a bit of time.Given that BLIS has a lot of source files, the build process can be parallelized easily. I just changed the logic of the
ExtensionBuilder.compile_objects
code to actually invoke the compiler to build objects in parallel with aThreadPool
, based on theparallel
flag of the command line (which is a defaultbuild_ext
option), or using theMAX_JOBS
environment variable (similar to whattorch
andflash-attn
are doing).By default, I left the job count to
1
, so that parallel compilation happens only if enabled. Using 4 threads, the compilation is about twice faster:MAX_JOBS="4" pip install blis --no-binary=blis