Fix the issue that len(indices) and num_samples might not be equal #1339

sunjq1 · 2024-11-15T12:47:47Z

What changes were proposed in this pull request?

Modified the definition of total_size in the load_state_dict function and the definition of indices in the __iter__ function to ensure that assert len(indices) == self.num_samples.

Why are the changes needed?

In the previous ElasticDistributedSampler code, the issue where completed_num not being divisible by num_replicas could cause an AssertionError in the scenario of importing a checkpoint was not considered.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT and training test.

…en importing a checkpoint to resume training

BalaBalaYi

please add ut to cover the issue mentioned

sunjq1 · 2024-11-26T07:39:12Z

please add ut to cover the issue mentioned

since I adjusted the logic for splitting indices after loading the checkpoint, I corrected the num values in the UT file

codecov · 2024-11-27T07:05:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.82%. Comparing base (f03b769) to head (81a0fd2).
Report is 113 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1339      +/-   ##
==========================================
- Coverage   80.48%   79.82%   -0.66%     
==========================================
  Files         219      240      +21     
  Lines       20208    22578    +2370     
==========================================
+ Hits        16264    18023    +1759     
- Misses       3944     4555     +611

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sunjq1 · 2024-12-06T06:12:59Z

@BalaBalaYi Hi，the submitted code format optimization has been completed

Fix the issue that len(indices) and num_samples might not be equal wh…

5a524a7

…en importing a checkpoint to resume training

sunjq1 requested review from workingloong, samplise, BalaBalaYi and majieyue as code owners November 15, 2024 12:47

BalaBalaYi requested changes Nov 19, 2024

View reviewed changes

Correct the numerical values in the UT file

81a0fd2

sunjq1 requested a review from BalaBalaYi November 26, 2024 07:42

Optimize code formatting

a309b25

sunjq1 and others added 3 commits December 6, 2024 16:08

Merge branch 'master' into fix_sampler

911ff5b

reformatted sampler.py

d77d375

Merge branch 'master' into fix_sampler

51ea8ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue that len(indices) and num_samples might not be equal #1339

Fix the issue that len(indices) and num_samples might not be equal #1339

sunjq1 commented Nov 15, 2024

BalaBalaYi left a comment

sunjq1 commented Nov 26, 2024

codecov bot commented Nov 27, 2024

sunjq1 commented Dec 6, 2024

Fix the issue that len(indices) and num_samples might not be equal #1339

Are you sure you want to change the base?

Fix the issue that len(indices) and num_samples might not be equal #1339

Conversation

sunjq1 commented Nov 15, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

BalaBalaYi left a comment

Choose a reason for hiding this comment

sunjq1 commented Nov 26, 2024

codecov bot commented Nov 27, 2024

Codecov Report

sunjq1 commented Dec 6, 2024