Add Random time-based sampler #255

NicolasHug · 2024-10-10T09:57:02Z

This PR adds a random time-based sampler, and also updates our benchmark.

It must take a num_clips parameter instead of seconds_between_clip_starts like in clips_at_regular_timestamps().

This means the _generic_time_based_sampler takes both num_clips and seconds_between_clip_starts as parameter, and they're mutually exclusive. That is not user-facing.

The _implem.py file is growing large. When this PR is merged, I think it'll be time to re-organize and split the implementation in different files. Typically the policies could be kept separate.

…_based

ahmadsharif1

Approving to unblock. But take a look at the comments to see if you can address them

ahmadsharif1 · 2024-10-11T13:36:32Z

src/torchcodec/samplers/_implem.py

+        num_clips is not None and seconds_between_clip_starts is not None
+    ):
+        # This is internal only and should never happen
+        raise ValueError("Bad, bad programmer!")


Can you add more details to this message, like the actual values of num_clips and seconds_between...

I will change the error message to something more conventional, but to repeat the comment: this is non-user facing code, it's purely internal and should never ever be triggered

ahmadsharif1 · 2024-10-11T13:42:23Z

test/samplers/test_samplers.py

-    # 2 different clips when teh sampling range is 2 seconds.
+    # range to be 1 second or 2 seconds. Since we set
+    # seconds_between_clip_starts to 1 we expect exactly one clip with the
+    # sampling range is of size 1, and 2 different clips when teh sampling range


s/teh/the

Also why was this comment change made? Text seems the same?

I had just edited a typo from seconds_between_clip_start to seconds_between_clip_starts

ahmadsharif1 · 2024-10-11T13:43:46Z

test/samplers/test_samplers.py

@@ -252,10 +274,10 @@ def test_sampling_range_default_behavior_random_sampler():

    num_clips = 20
    num_frames_per_clip = 15
-    sampling_range_start = -20
+    sampling_range_start = -20 if sampler is clips_at_random_indices else 11


Maybe add a comment saying negative ranges are fine because they are like python indexes?

I don't think this is relevant to add in this test honestly. This should just be part of the docstring of samplers (which is still to be done)

ahmadsharif1 · 2024-10-11T13:45:16Z

test/samplers/test_samplers.py

@@ -321,14 +343,19 @@ def test_sampling_range_default_regular_sampler(sampler):
        partial(
            clips_at_regular_indices, sampling_range_start=-1, sampling_range_end=1000
        ),
-        # Note: the hard-coded value of sampling_range_start=12 is because we know
+        # Note: the hard-coded value of sampling_range_start=13 is because we know


Why is this not exactly 13.01? Can you explain the epsilon value in a comment?

Assuming exactly 13.01 corresponds to end_stream_seconds, then 13.01 would be an invalid value here.

We just need to set a value that is < end_stream_seconds so that the clip start is valid, but large enough so that the clip span is such that it goes beyond end_stream_seconds and thus triggers the "error" policy.

Similarly for the index-based case, we set this value to -1, thus enforcing the clip to start on the last frame.

I will slightly update the comment so as to not claim that the video duration is exactly 13.01.

ahmadsharif1 · 2024-10-11T13:48:15Z

test/samplers/test_samplers.py

+    if sampler is clips_at_random_timestamps:
+        with pytest.raises(
+            ValueError,
+            match=re.escape("num_clips (0) must be > 0"),
+        ):
+            sampler(decoder, num_clips=0)
+    else:


Maybe it's just me but parameterizing tests and then adding if conditions for particular values with specific results seems like an anti-pattern.

It makes the test harder to read. Why not have separate tests so it's way more explicit

One of the reasons these tests exist is to show the user what not to do in a very clear and explicit manner

I agree that parametrization can lead to some quirks like this. If you feel strongly that these 2 unique checks should be written into 2 new separate test functions, I'll do it as a follow up.

I expect however these error checks to be a "write once and never go back to it" thing, so I'm personally OK with the if/else awkwardness.

ahmadsharif1 · 2024-10-11T13:52:43Z

src/torchcodec/samplers/_implem.py

+        sampling_range_width = sampling_range_end - sampling_range_start
+        # torch.rand() returns in [0, 1)
+        # which ensures all clip starts are < sampling_range_end
+        clip_start_seconds = (


I wonder if we should sort these clips starts by start_timestamps?

Do you know how the existing samplers do it and what the user expectations are here?

Existing samplers don't sort https://github.com/pytorch/vision/blob/ed55b0309fc3ed7d8abc4e4172b8a3c9852ef454/torchvision/datasets/samplers/clip_sampler.py#L156-L170

NicolasHug · 2024-10-11T14:54:57Z

Test on Linux CUDA is red but it looks like it's an unrelated infra issue, I'll merge. Thanks for the review!

NicolasHug added 13 commits October 8, 2024 06:10

Add time-based regular sampler

6471c2a

Refac, some comments

e00fce7

More tests

cca1ff9

More tests

0afbb97

Fix mypy

58bbd7f

Typo

24cebf8

Minor simpification

5d44147

Typo

da5d3be

Fix _assert_regular_sampler

6550af9

Address remaining comments

bdf08a1

Add random time-based sampler

db48a87

Merge branch 'main' of github.com:pytorch/torchcodec into random_time…

dda564a

…_based

fix mypy

29e4d94

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2024

Fix

4ef46f6

NicolasHug changed the title ~~Add Random time-based~~ Add Random time-based sampler Oct 10, 2024

Update benchmark

ff97ea7

ahmadsharif1 approved these changes Oct 11, 2024

View reviewed changes

Address comments

6183234

NicolasHug merged commit cff9492 into main Oct 11, 2024
21 of 22 checks passed

NicolasHug deleted the random_time_based branch October 11, 2024 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Random time-based sampler #255

Add Random time-based sampler #255

NicolasHug commented Oct 10, 2024 •

edited

Loading

ahmadsharif1 left a comment

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

ahmadsharif1 Oct 11, 2024

NicolasHug Oct 11, 2024

NicolasHug commented Oct 11, 2024

Add Random time-based sampler #255

Add Random time-based sampler #255

Conversation

NicolasHug commented Oct 10, 2024 • edited Loading

ahmadsharif1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 11, 2024

NicolasHug commented Oct 10, 2024 •

edited

Loading