You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now torch_dataloader_opts = dict(num_workers=1) provides another way for PyTorch to use multiprocessing for the dataset (#1383). This leads to 98% computation time, i.e. the dataset is not a bottleneck anymore.
This was not the case with num_workers=0 and MultiProcDataset, which gave me only about 75% computation time. So it seems like there is still some overhead in MultiProcDataset? Or maybe this is not in MultiProcDataset but in our Torch data pipeline (ReturnnDatasetIterDataPipe, ..., BatchingIterDataPipe), because with num_workers>0, I think even that part will be in the subproc, while with MultiProcDataset, this still happens in the main proc.
I'm not sure if we can do much about it but I just wanted to document this.
The text was updated successfully, but these errors were encountered:
I don't plan to do anything about this now, and as said, not sure if we even can do much about it, and using num_workers=1 is anyway a good solution, so I'm closing this now.
Now
torch_dataloader_opts = dict(num_workers=1)
provides another way for PyTorch to use multiprocessing for the dataset (#1383). This leads to 98% computation time, i.e. the dataset is not a bottleneck anymore.This was not the case with
num_workers=0
andMultiProcDataset
, which gave me only about 75% computation time. So it seems like there is still some overhead inMultiProcDataset
? Or maybe this is not inMultiProcDataset
but in our Torch data pipeline (ReturnnDatasetIterDataPipe
, ...,BatchingIterDataPipe
), because withnum_workers>0
, I think even that part will be in the subproc, while withMultiProcDataset
, this still happens in the main proc.I'm not sure if we can do much about it but I just wanted to document this.
The text was updated successfully, but these errors were encountered: