-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DALI proxy #5726
base: main
Are you sure you want to change the base?
Introduce DALI proxy #5726
Conversation
CI MESSAGE: [20865442]: BUILD STARTED |
CI MESSAGE: [20865442]: BUILD PASSED |
I see there are no tests except for the resnet50 example. I believe we should have normal TL0 tests as well. |
Did you test error propagation between the processes? Multiprocessing doesn't automagically propagate exceptions afaik. Maybe we should have tests to check how are the errors in particular processes reported? |
docs/examples/use_cases/pytorch/efficientnet/image_classification/dataloaders.py
Outdated
Show resolved
Hide resolved
caf2d0b
to
0472490
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting few comments as the code started moving.
I also feel like we should limit the scope of the API and hide most of the implementation.
c4a7f74
to
1a50983
Compare
CI MESSAGE: [21050826]: BUILD STARTED |
CI MESSAGE: [21050826]: BUILD FAILED |
3bdfd37
to
014032d
Compare
CI MESSAGE: [21052236]: BUILD STARTED |
CI MESSAGE: [21052236]: BUILD FAILED |
014032d
to
a802a29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still didn't read all the tests, but posting more comment to the implementation.
docs/plugins/pytorch_dali_proxy.rst
Outdated
|
||
**DALI Proxy** is a tool designed to integrate NVIDIA DALI pipelines with PyTorch data workers while maintaining the simplicity of PyTorch's dataset logic. The key features of DALI Proxy include: | ||
|
||
- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs on the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs on the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU. | |
- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs in the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU. |
?
docs/plugins/pytorch_dali_proxy.rst
Outdated
- Each data worker invokes the proxy, which returns a **reference to a future processed sample**. | ||
- During batch collation, the proxy groups data into a batch and sends it to the server for execution. | ||
- The server processes the batch asynchronously and outputs the actual data to an output queue. | ||
- The PyTorch DataLoader retrieves either the processed data or references to pending pipeline runs. If it encounters pipeline run references, it queries the DALI server for the actual data, waiting if necessary until the data becomes available in the output queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is too complicated. I would stick with the simplified user POV - you call proxy in the worker to offload data for processing with DALI and put a placeholder for the result. When the data loader returns the processed data it replaces the placeholders with the actual results from DALI pipeline. Skip here the If it encounters pipeline run references
and the waiting parts.
docs/plugins/pytorch_dali_proxy.rst
Outdated
|
||
**1. DALI Pipeline** | ||
|
||
The DALI pipeline defines the data processing steps. Input data is fed using ``fn.external_source``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are expanding, I guess this might be a nice place to mention the mapping between the external_source
and the names of the parameters. Please, mention that we require at least one input, and it is the input to the proxy.
You can link to the operator doc and the argument with something like this AFAIR
:meth:`~nvidia.dali.fn.external_source`
:paramref:`~nvidia.dali.fn.external_source.source`
docs/plugins/pytorch_dali_proxy.rst
Outdated
|
||
**5. Integration with PyTorch DataLoader** | ||
|
||
The ``DataLoader`` wrapper provided by DALI Proxy simplifies the integration process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expose the dali_proxy DataLoader and DALIServer via autoclass here and link to those sections whenever you mention them? We have docstrings there, but we don't show it here.
Maybe mention that one can start and stop server by hand, but the context is the recommended way?
@@ -56,22 +56,36 @@ export PATH_TO_IMAGENET=/imagenet | |||
export RESULT_WORKSPACE=./ | |||
|
|||
# synthetic benchmark | |||
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 1 --prof 1000 --no-checkpoints --training-only --data-backend synthetic --workspace $RESULT_WORKSPACE --report-file bench_report_synthetic.json $PATH_TO_IMAGENET | |||
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 3 --prof 1000 --no-checkpoints --training-only --data-backend synthetic --workspace $RESULT_WORKSPACE --report-file bench_report_synthetic.json $PATH_TO_IMAGENET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work correctly with more than one epoch? With synthetic benchmark the concept of the epoch didn't really exist as far as I can remember, that's why it just did 1k iterations. Dunno if it will now make 3k or explode, but making it longer doesn't really give us much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the iterator does have a size so it works
raise RuntimeError("The provided pipeline doesn't have any inputs") | ||
pipe_input_names_set = set(pipe_input_names) | ||
input_names_set = set(input_names or []) | ||
if len(input_names_set) != len(input_names_set): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(input_names_set) != len(input_names_set): | |
if len(input_names_set) != len(input_names): |
This will fail if you allow input_names
to be None. Also, now it tests equality of the same thing.
pipe_input_names_set = set(pipe_input_names) | ||
input_names_set = set(input_names or []) | ||
if len(input_names_set) != len(input_names_set): | ||
raise RuntimeError("``input_names`` argument should not contain any duplicated values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError("``input_names`` argument should not contain any duplicated values") | |
raise RuntimeError(f"``input_names`` argument should not contain any duplicated values, got {input_names}.") |
call_impl.__signature__ = inspect.Signature(parameters) | ||
_DALIProxy.__call__ = call_impl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool 😎
Does it have a chance of working with IDE or jupyter?
I think for IDE to work, we need the __call__
visible statically, there is no chance of injecting a proper signature stub, so we probably still need to have a static __call__(self, *inputs, **kwargs)
with a docstring defined, and replace it with this hook.
dali/python/nvidia/dali/plugin/pytorch/experimental/proxy/__init__.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
…rray) Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
75f5921
to
32983fe
Compare
CI MESSAGE: [21819002]: BUILD STARTED |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
32983fe
to
18e0781
Compare
Signed-off-by: Joaquin Anton Guirao <[email protected]>
be48c3d
to
fb43cf3
Compare
CI MESSAGE: [21823100]: BUILD STARTED |
Category:
New feature
Description:
Co-author: @mdabek-nvidia
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
dali/python/nvidia/dali/plugin/pytorch/init.py
Tests:
Added options to run with DALI proxy to RN50 and EfficientNet examples
Checklist
Documentation
TODO
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A