Introduce DALI proxy #5726

jantonguirao · 2024-11-27T16:37:30Z

Category:

New feature

Description:

DALI proxy is a new way to integrate DALI pipelines with existing torch data loading pipelines
The idea is that torch data processes send data to the main process, where it is processed by DALI before handling it over to the training loop.
The solution allows for mixing data loading from Pytorch with partial processing on DALI

Co-author: @mdabek-nvidia

Additional information:

Affected modules and functionalities:

Torch plugin

Key points relevant for the review:

dali/python/nvidia/dali/plugin/pytorch/init.py

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

docs/examples/use_cases/pytorch/efficientnet/main.py

dali-automaton · 2024-11-27T16:47:29Z

CI MESSAGE: [20865442]: BUILD STARTED

dali-automaton · 2024-11-28T04:38:01Z

CI MESSAGE: [20865442]: BUILD PASSED

dali/python/nvidia/dali/plugin/pytorch/__init__.py

szkarpinski · 2024-11-29T09:56:33Z

I see there are no tests except for the resnet50 example. I believe we should have normal TL0 tests as well.

szkarpinski · 2024-11-29T09:59:26Z

Did you test error propagation between the processes? Multiprocessing doesn't automagically propagate exceptions afaik. Maybe we should have tests to check how are the errors in particular processes reported?

docs/examples/use_cases/pytorch/efficientnet/image_classification/dataloaders.py

dali/python/nvidia/dali/plugin/pytorch/__init__.py

klecki

Posting few comments as the code started moving.
I also feel like we should limit the scope of the API and hide most of the implementation.

dali/python/setup.py.in

dali/python/nvidia/dali/plugin/pytorch/__init__.py

dali/python/nvidia/dali/plugin/pytorch/proxy/__init__.py

docs/examples/use_cases/pytorch/efficientnet/main.py

dali-automaton · 2024-12-03T17:55:40Z

CI MESSAGE: [21050826]: BUILD STARTED

dali-automaton · 2024-12-03T18:01:19Z

CI MESSAGE: [21050826]: BUILD FAILED

dali-automaton · 2024-12-03T18:33:45Z

CI MESSAGE: [21052236]: BUILD STARTED

dali/python/nvidia/dali/plugin/pytorch/proxy/__init__.py

dali-automaton · 2024-12-03T18:44:06Z

CI MESSAGE: [21052236]: BUILD FAILED

klecki

I still didn't read all the tests, but posting more comment to the implementation.

klecki · 2024-12-20T16:36:58Z

docs/plugins/pytorch_dali_proxy.rst

+
+**DALI Proxy** is a tool designed to integrate NVIDIA DALI pipelines with PyTorch data workers while maintaining the simplicity of PyTorch's dataset logic. The key features of DALI Proxy include:
+
+- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs on the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU.


Suggested change

- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs on the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU.

- **Efficient GPU Utilization**: DALI Proxy ensures GPU data processing occurs in the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU.

?

klecki · 2024-12-20T16:42:01Z

docs/plugins/pytorch_dali_proxy.rst

+- Each data worker invokes the proxy, which returns a **reference to a future processed sample**.
+- During batch collation, the proxy groups data into a batch and sends it to the server for execution.
+- The server processes the batch asynchronously and outputs the actual data to an output queue.
+- The PyTorch DataLoader retrieves either the processed data or references to pending pipeline runs. If it encounters pipeline run references, it queries the DALI server for the actual data, waiting if necessary until the data becomes available in the output queue.


This sentence is too complicated. I would stick with the simplified user POV - you call proxy in the worker to offload data for processing with DALI and put a placeholder for the result. When the data loader returns the processed data it replaces the placeholders with the actual results from DALI pipeline. Skip here the If it encounters pipeline run references and the waiting parts.

klecki · 2024-12-20T16:49:10Z

docs/plugins/pytorch_dali_proxy.rst

+
+**1. DALI Pipeline**
+
+The DALI pipeline defines the data processing steps. Input data is fed using ``fn.external_source``.


As we are expanding, I guess this might be a nice place to mention the mapping between the external_source and the names of the parameters. Please, mention that we require at least one input, and it is the input to the proxy.

You can link to the operator doc and the argument with something like this AFAIR

:meth:`~nvidia.dali.fn.external_source` :paramref:`~nvidia.dali.fn.external_source.source`

klecki · 2024-12-20T16:53:16Z

docs/plugins/pytorch_dali_proxy.rst

+
+**5. Integration with PyTorch DataLoader**
+
+The ``DataLoader`` wrapper provided by DALI Proxy simplifies the integration process.


Could you expose the dali_proxy DataLoader and DALIServer via autoclass here and link to those sections whenever you mention them? We have docstrings there, but we don't show it here.

Maybe mention that one can start and stop server by hand, but the context is the recommended way?

klecki · 2024-12-20T16:58:29Z

qa/TL3_EfficientNet_benchmark/test_pytorch.sh

@@ -56,22 +56,36 @@ export PATH_TO_IMAGENET=/imagenet
 export RESULT_WORKSPACE=./

 # synthetic benchmark
-python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 1 --prof 1000 --no-checkpoints --training-only --data-backend synthetic --workspace $RESULT_WORKSPACE --report-file bench_report_synthetic.json $PATH_TO_IMAGENET
+python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 3 --prof 1000 --no-checkpoints --training-only --data-backend synthetic --workspace $RESULT_WORKSPACE --report-file bench_report_synthetic.json $PATH_TO_IMAGENET


Does it work correctly with more than one epoch? With synthetic benchmark the concept of the epoch didn't really exist as far as I can remember, that's why it just did 1k iterations. Dunno if it will now make 3k or explode, but making it longer doesn't really give us much.

the iterator does have a size so it works

klecki · 2024-12-20T17:50:38Z

dali/python/nvidia/dali/plugin/pytorch/experimental/proxy/__init__.py

+            raise RuntimeError("The provided pipeline doesn't have any inputs")
+        pipe_input_names_set = set(pipe_input_names)
+        input_names_set = set(input_names or [])
+        if len(input_names_set) != len(input_names_set):


Suggested change

if len(input_names_set) != len(input_names_set):

if len(input_names_set) != len(input_names):

This will fail if you allow input_names to be None. Also, now it tests equality of the same thing.

klecki · 2024-12-20T17:50:54Z

dali/python/nvidia/dali/plugin/pytorch/experimental/proxy/__init__.py

+        pipe_input_names_set = set(pipe_input_names)
+        input_names_set = set(input_names or [])
+        if len(input_names_set) != len(input_names_set):
+            raise RuntimeError("``input_names`` argument should not contain any duplicated values")


Suggested change

raise RuntimeError("``input_names`` argument should not contain any duplicated values")

raise RuntimeError(f"``input_names`` argument should not contain any duplicated values, got {input_names}.")

klecki · 2024-12-20T18:04:27Z

dali/python/nvidia/dali/plugin/pytorch/experimental/proxy/__init__.py

+            call_impl.__signature__ = inspect.Signature(parameters)
+            _DALIProxy.__call__ = call_impl


Cool 😎
Does it have a chance of working with IDE or jupyter?

I think for IDE to work, we need the __call__ visible statically, there is no chance of injecting a proper signature stub, so we probably still need to have a static __call__(self, *inputs, **kwargs) with a docstring defined, and replace it with this hook.

dali/python/nvidia/dali/plugin/pytorch/experimental/proxy/__init__.py

Signed-off-by: Joaquin Anton Guirao <[email protected]>

…rray) Signed-off-by: Joaquin Anton Guirao <[email protected]>

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2024-12-24T12:16:34Z

CI MESSAGE: [21819002]: BUILD STARTED

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2024-12-24T15:35:58Z

CI MESSAGE: [21823100]: BUILD STARTED

github-advanced-security bot found potential problems Nov 27, 2024

View reviewed changes

jantonguirao changed the title ~~Dali proxy2~~ Introduce DALI proxy Nov 28, 2024

jantonguirao marked this pull request as ready for review November 28, 2024 09:21

dali-automaton assigned klecki and szkarpinski Nov 28, 2024

szkarpinski requested changes Nov 29, 2024

View reviewed changes

szkarpinski reviewed Nov 29, 2024

View reviewed changes

docs/examples/use_cases/pytorch/efficientnet/image_classification/dataloaders.py Outdated Show resolved Hide resolved

dali/python/nvidia/dali/plugin/pytorch/__init__.py Outdated Show resolved Hide resolved

jantonguirao force-pushed the dali_proxy2 branch 2 times, most recently from caf2d0b to 0472490 Compare November 29, 2024 10:28

klecki reviewed Nov 29, 2024

View reviewed changes

dali/python/setup.py.in Outdated Show resolved Hide resolved

dali/python/nvidia/dali/plugin/pytorch/__init__.py Outdated Show resolved Hide resolved

dali/python/nvidia/dali/plugin/pytorch/__init__.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Dec 3, 2024

View reviewed changes

jantonguirao force-pushed the dali_proxy2 branch 7 times, most recently from c4a7f74 to 1a50983 Compare December 3, 2024 17:52

jantonguirao force-pushed the dali_proxy2 branch 2 times, most recently from 3bdfd37 to 014032d Compare December 3, 2024 18:32

github-advanced-security bot found potential problems Dec 3, 2024

View reviewed changes

dali/python/nvidia/dali/plugin/pytorch/proxy/__init__.py Fixed Show fixed Hide fixed

jantonguirao force-pushed the dali_proxy2 branch from 014032d to a802a29 Compare December 4, 2024 08:48

klecki reviewed Dec 20, 2024

View reviewed changes

jantonguirao added 20 commits December 24, 2024 10:28

Introduce DALI proxy

1739c82

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Code review changes

70c69a1

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Remove nvtx import

165e7bb

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Move to torch_utils

c34db8f

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Add tests

d931d75

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Code review fixes

31a5a12

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Invoke test_dali_proxy.py

75b274c

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Fix

984980b

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Code review fixes

255b61b

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Add deterministic behavior

e17e033

Signed-off-by: Joaquin Anton Guirao <[email protected]>

documentation

f976bc5

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Code review fixes

33e0d0a

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Moving DALI proxy to experimental

b509dc0

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Fixes

0af7fb3

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Update tutorial

facf120

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Add documentation

de847be

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Enhance tests

511436b

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Allow PIL.Image as an input to the pipeline (can be converted to np.a…

926b3bc

…rray) Signed-off-by: Joaquin Anton Guirao <[email protected]>

Apply code review suggestions

826f463

Signed-off-by: Joaquin Anton Guirao <[email protected]>

Add TL3_RN50_benchmark/test_pytorch.sh

659d0f1

Signed-off-by: Joaquin Anton Guirao <[email protected]>

jantonguirao force-pushed the dali_proxy2 branch 3 times, most recently from 75f5921 to 32983fe Compare December 24, 2024 12:16

Code review fixes

18e0781

Signed-off-by: Joaquin Anton Guirao <[email protected]>

jantonguirao force-pushed the dali_proxy2 branch from 32983fe to 18e0781 Compare December 24, 2024 12:49

Simplify example code

fb43cf3

Signed-off-by: Joaquin Anton Guirao <[email protected]>

jantonguirao force-pushed the dali_proxy2 branch from be48c3d to fb43cf3 Compare December 24, 2024 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce DALI proxy #5726

Introduce DALI proxy #5726

jantonguirao commented Nov 27, 2024 •

edited

Loading

dali-automaton commented Nov 27, 2024

dali-automaton commented Nov 28, 2024

szkarpinski commented Nov 29, 2024

szkarpinski commented Nov 29, 2024

klecki left a comment

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

klecki left a comment

klecki Dec 20, 2024

klecki Dec 20, 2024

klecki Dec 20, 2024

klecki Dec 20, 2024

klecki Dec 20, 2024

jantonguirao Dec 24, 2024

klecki Dec 20, 2024

klecki Dec 20, 2024

klecki Dec 20, 2024

dali-automaton commented Dec 24, 2024

dali-automaton commented Dec 24, 2024


		DALI Proxy is a tool designed to integrate NVIDIA DALI pipelines with PyTorch data workers while maintaining the simplicity of PyTorch's dataset logic. The key features of DALI Proxy include:

		- Efficient GPU Utilization: DALI Proxy ensures GPU data processing occurs on the same process running the main loop. This avoids performance degradation caused by multiple CUDA contexts for the same GPU.


		1. DALI Pipeline

		The DALI pipeline defines the data processing steps. Input data is fed using ``fn.external_source``.


		5. Integration with PyTorch DataLoader

		The ``DataLoader`` wrapper provided by DALI Proxy simplifies the integration process.

	if len(input_names_set) != len(input_names_set):
	if len(input_names_set) != len(input_names):

	raise RuntimeError("``input_names`` argument should not contain any duplicated values")
	raise RuntimeError(f"``input_names`` argument should not contain any duplicated values, got {input_names}.")

		call_impl.__signature__ = inspect.Signature(parameters)
		_DALIProxy.__call__ = call_impl

Introduce DALI proxy #5726

Are you sure you want to change the base?

Introduce DALI proxy #5726

Conversation

jantonguirao commented Nov 27, 2024 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Nov 27, 2024

dali-automaton commented Nov 28, 2024

szkarpinski commented Nov 29, 2024

szkarpinski commented Nov 29, 2024

klecki left a comment

Choose a reason for hiding this comment

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

dali-automaton commented Dec 3, 2024

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Dec 24, 2024

dali-automaton commented Dec 24, 2024

jantonguirao commented Nov 27, 2024 •

edited

Loading