-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch dataset use multiprocessing #1383
Comments
I think your code sinppet looks good. Only if DDP and multiple workers are used
Additionally, you have to add a
Yes, if you are missing the |
This is just copied & pasted from the original docs (what I linked), so of course, I would hope that this snippet should be ok. Have you tried that in RETURNN? Why did you not make a PR?
But they will all get a different random seed, as we do it with DDP, so it's not really a problem, right? It just changes what an "epoch" means now. I just wanted to verify this. If you actually use |
Now that we use In 5b569b3, I added the config option
The computation time should tell you in the end if you have a bottleneck with the dataset or not. Now with this option, I also see around 98% computation time (on demo-rf-pt-benchmark with num_workers=1). So I guess this issue can be closed now. |
Now that we use
DataLoader
(v1) again (c0ac991, fixed #1382), we can directly use thenum_workers
option.In 5b569b3, I added the config option
torch_dataloader_opts
, which you can set like:num_workers = 1
will use a single worker only, but then this should just behave as before w.r.t. to the epoch size. Otherwise (num_workers > 1) it would duplicate the data over the workers, because we do not do any sharding. (If this is an issue for you, please open a new issue about it.)num_workers = 1
should in principle be also fast enough in most cases (in all our TF-based experiments, we also only had a single worker, and computation time was always close to 100%, i.e. the dataset was never a bottleneck).The computation time should tell you in the end if you have a bottleneck with the dataset or not. Now with this option, I also see around 98% computation time (on demo-rf-pt-benchmark with num_workers=1).
Note that you can also use DataLoader
num_workers=1
and additionally useMultiProcDataset
with a higher num workers, becauseMultiProcDataset
does handle the sharding correctly.For TorchData
DataLoader2
:I think it needs some code like this:
Via.
(Related is #1382, however, until this is resolved, we should probably anyway implement this here for now.)
Some things which need to be clarified:
num_workers
times more data now? Just like in DDP training? -> Yes. But there is sth likesharding_filter
.Also note that we have another alternative:
MultiProcDataset
. This one keeps the original epochs, i.e. it implements sharding.Some options to implement the sharding logic:
sharding_filter
, how does this work?ReturnnDatasetIterDataPipe
,ReturnnDatasetResetDefaultEpochCounterCallback
orReturnnDatasetResetMpSharedEpochCallback
.ShardingDataset
wrapping dataset, similar likeMultiProcDataset
but without the multi-proc logic.MetaDataset
. (But I don't like to extendMetaDataset
more and more by such unrelated logic... I prefer to have this separate.)Dataset.get_seq_order_for_epoch
.Dataset.partition_epoch
.In all cases, the user might actually be interested in switching between the logic, just like
horovod_dataset_distribution="random_seed_offset"
vshorovod_dataset_distribution="shard"
.The text was updated successfully, but these errors were encountered: