Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size of trainloader #39

Open
Catecher98 opened this issue Mar 3, 2022 · 4 comments
Open

size of trainloader #39

Catecher98 opened this issue Mar 3, 2022 · 4 comments

Comments

@Catecher98
Copy link

Dear author,
I have some questions about the training phase on Lucir+AANets. When I run the main.py on CIFAR100, the size of training set in the 0-th is only 156, and 55,55,55,55,56 for the rest phases, respectively. I was wondering how many images are trained in each epoch? Because the 0-th phase needs to train 50 classes, and in CIFAR100 50 classes means 50*500 images totally.

@yaoyao-liu
Copy link
Owner

yaoyao-liu commented Mar 3, 2022

Hi @Catecher98,

Thanks for your interest in our work.
We will indeed observe 50*500 images in the zeroth phase. Could you please give me more hints on how you got the number "156"? Based on your information, you cannot understand why you only get 156 images in the zeroth phase.

Have a nice day!

Best,
Yaoyao

@Catecher98
Copy link
Author

Well, thanks for your reply. I’m sorry that I remembered wrong about the size of train set in the 0th phase. Actually the size of trainset is 196 in the 0th phase. Because my computer trains an epoch very fast(about 10secs/epoch), so I have this doubt. The training set size of each epoch displayed on the terminal is 196, why is it so small? Shouldn't it be 50*500?mmexport1646298318622.png

@yaoyao-liu
Copy link
Owner

yaoyao-liu commented Mar 3, 2022

In the screenshot you provide, 196 is the value for len(trainloader):

print('Train set: {}, train loss1: {:.4f}, train loss2: {:.4f}, train loss3: {:.4f}, train loss: {:.4f} accuracy: {:.4f}'.format(len(trainloader), train_loss1/(batch_idx+1), train_loss2/(batch_idx+1), train_loss3/(batch_idx+1), train_loss/(batch_idx+1), 100.*correct/total))

It is not the number of training samples in the dataset. Instead, it is the number of batches we have in each epoch.
It is computed by the following function:

    def __len__(self) -> int:
        if self._dataset_kind == _DatasetKind.Iterable:
            # NOTE [ IterableDataset and __len__ ]
            #
            # For `IterableDataset`, `__len__` could be inaccurate when one naively
            # does multi-processing data loading, since the samples will be duplicated.
            # However, no real use case should be actually using that behavior, so
            # it should count as a user error. We should generally trust user
            # code to do the proper thing (e.g., configure each replica differently
            # in `__iter__`), and give us the correct `__len__` if they choose to
            # implement it (this will still throw if the dataset does not implement
            # a `__len__`).
            #
            # To provide a further warning, we track if `__len__` was called on the
            # `DataLoader`, save the returned value in `self._len_called`, and warn
            # if the iterator ends up yielding more than this number of samples.

            # Cannot statically verify that dataset is Sized
            length = self._IterableDataset_len_called = len(self.dataset)  # type: ignore[assignment, arg-type]
            if self.batch_size is not None:  # IterableDataset doesn't allow custom sampler or batch_sampler
                from math import ceil
                if self.drop_last:
                    length = length // self.batch_size
                else:
                    length = ceil(length / self.batch_size)
            return length
        else:
            return len(self._index_sampler)

You may see the source code of the dataloader in pytorch for more details.

If you have further questions, please feel free to add comments to this issue.

@Catecher98
Copy link
Author

OK,thanks for your detailed reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants