Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCIC 2024: Updates and Fixes DYAD Component of Tutorial #43

Merged
merged 6 commits into from
Aug 28, 2024

Conversation

ilumsden
Copy link
Collaborator

Same as #40, but re-targeted to be merged into master.

hariharan-devarajan and others added 6 commits August 20, 2024 20:57
This commit corrects logic in the the PyTorch data loader for DYAD.
It also makes various corrections to the text in the DYAD notebook.
The flux-sched image for Ubuntu Jammy has a system install of
UCX 1.12.0. However, we are wanting to use UCX 1.13.1 with DYAD.
This commit updates LD_LIBRARY_PATH to point to UCX 1.13.1 to prevent
runtime issues with DYAD.
In light of the name change of DLIO Profiler to DFTracer, this commit
updates the env file created in the DYAD notebook to use the new names
for environment variables.
This commit fixes a bug in the DYAD PyTorch data loader
that causes 'brokers_per_node' to not be set before reference.
This commit tweaks the DLIO config file to use forking for
multiprocessing instead of spawning
This commit changes cpu-affinity to off when running DLIO for
training for consistency
@vsoch
Copy link
Member

vsoch commented Aug 21, 2024

@ilumsden and @hariharan-devarajan thank you for your work on this! I'll try to review it before the end of the week.

@vsoch
Copy link
Member

vsoch commented Aug 28, 2024

I won't get a chance to test this - just too much for the performance study. I'm going to merge, and we will test the containers tomorrow when we do the setup. Thanks to you both!

@vsoch vsoch merged commit d90c83a into flux-framework:master Aug 28, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants