Refactor seeking to only store pts as int64 timestamp #262

scotts · 2024-10-15T20:11:00Z

Refactor that:

Replaces std::optional<double> maybeDesiredPts_ in VideoDecoder with:
a. bool hasDesiredPts_ in VideoDecoder; and
b. setting each stream info's discardFramesBeforePts directly in VideoDecoder::setCursorPtsInSeconds(double second).
Adds VideoDecoder::setCursorPts(int64_t pts) which we can call when we have an int-based pts that came directly from FFmpeg. This is a private API, meant only to be used when we're round-tripping pts values from FFmpeg. We don't expose it to users.

Note the FIXME comment. It seems like we're "consuming" the values of discardFramesBeforePts in maybeSeekToBeforeDesiredPts(). We reset hasDesiredPts_ after calling it (and we previously reset maybeDesiredPts_). It seems like we should also reset all of the discardFramesBeforePts values.

Thinking about it more, I'm also unsure if we need discardFramesBeforePts to be an optional. Because we initialize it to be 0 and we never "clear" the value out, that implies two things:

0 is a reasonable default, and when there is a reasonable default, it's probably not worth having an optional value.
The value is always present in our current code.

If 0 is not a reasonable default, then we can just use the alternative case when the optional is not present in getNextRawDecodedOutputNoDemux(): INT64_MIN.

ahmadsharif1 · 2024-10-16T12:35:05Z

Can you run benchmarks to ensure there are no regressions?

Also explain why this change is being done in the first place, and whether it is intended to keep existing behavior or change it. Will there be any scenarios where behavior is changed?

scotts · 2024-10-16T13:00:46Z

@ahmadsharif1, sure I'll get some benchmark numbers.

As a pure refactor, we're keeping existing behavior - hence no new tests. Any behavior change would be a bug. And good question on the motivation. There are two main motivations:

We introduce the potential for precision loss when we convert and store timestamp values as doubles. This is inevitable for when we want users to provide us seconds as doubles. But there are several instances where we start with a timestamp we got directly from FFmpeg as an int, and we want to seek to that position. Converting that to a double introduces the possibility of precision loss, and is confusing as we're always going to compare it against FFmpeg's int-based timestamps. We should instead store timestamps internally in the same format FFmpeg does, as ints. We then only receive timestamps as doubles from users. This enables us to only use int-based timestamps in some internal cases.
When digging into how to do the above, I discovered we have some redundant and unnecessary code. We should clean that up.

ahmadsharif1

I think we should add a random seek test before changing this code because seeking is hard to get right in terms of correctness and performance.

The test should seek forwards/backwards in a loop and grab the next frame and make sure it's at or higher than the seeked value.

It can also add streams in the middle.

We should also test seeking to beginning/end to make sure edge cases are handled correctly.

The original code is as lazy as it can be -- meaning the conversion from user's double to per-stream int64 happens as late as possible. That's actually more robust because user can seek first and add stream later. It's also more efficient.

Unless there is a compelling behavior change, I am inclined to keep the original behavior.

ahmadsharif1 · 2024-10-16T12:40:38Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  hasDesiredPts_ = true;
+}
+
+void VideoDecoder::setCursorPts(int64_t pts) {


I don't know that this function makes sense without all streams having a common timebase

Since it's internal, and we always know a stream index when calling it, I think it's fine if this takes a stream index.

ahmadsharif1 · 2024-10-16T12:52:04Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

@@ -1179,7 +1178,18 @@ VideoDecoder::DecodedOutput VideoDecoder::getNextDecodedOutputNoDemux() {
 }

 void VideoDecoder::setCursorPtsInSeconds(double seconds) {
-  maybeDesiredPts_ = seconds;
+  for (int streamIndex : activeStreamIndices_) {


I like the original change because its lazier -- when the user seeks a bunch of times only the last seek value should affect anything. The new change sets things eagerly

More importantly, if the user adds a stream after calling seek it doesn't seek that stream, so it's less robust too

ahmadsharif1 · 2024-10-16T12:53:01Z

src/torchcodec/decoders/_core/VideoDecoder.h

-  std::optional<double> maybeDesiredPts_;
+  // True when the user wants to seek. The actual pts values to seek to are
+  // stored in the per-stream metadata in discardFramesBeforePts.
+  bool hasDesiredPts_ = false;


Seeking is hard to implement correctly and I am not confident about this change.

I think at least for debugging purposes we should keep the original double value around and when returning a frame after a seek we should ensure it's >= the value passed in here

scotts · 2024-10-16T14:27:11Z

@ahmadsharif1, the laziness of the old behavior is a good point - I'm not sure how to retain that with my current approach. I'm going to drop the bigger part of the refactor, but I want to address discardFramesBeforePts not needing to be optional. I'm going to put up a separate PR for it.

Refactor seeking to only store pts as int64 timestamp

2ea149d

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2024

Merge branch 'main' of github.com:pytorch/torchcodec into refactor_seek

3485205

scotts force-pushed the refactor_seek branch from f16a6cd to 3485205 Compare October 16, 2024 03:33

scotts marked this pull request as ready for review October 16, 2024 03:56

scotts requested a review from ahmadsharif1 October 16, 2024 12:29

ahmadsharif1 reviewed Oct 16, 2024

View reviewed changes

scotts closed this Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor seeking to only store pts as int64 timestamp #262

Refactor seeking to only store pts as int64 timestamp #262

scotts commented Oct 15, 2024 •

edited

Loading

ahmadsharif1 commented Oct 16, 2024

scotts commented Oct 16, 2024

ahmadsharif1 left a comment

ahmadsharif1 Oct 16, 2024

scotts Oct 16, 2024

ahmadsharif1 Oct 16, 2024

ahmadsharif1 Oct 16, 2024

scotts commented Oct 16, 2024

Refactor seeking to only store pts as int64 timestamp #262

Refactor seeking to only store pts as int64 timestamp #262

Conversation

scotts commented Oct 15, 2024 • edited Loading

ahmadsharif1 commented Oct 16, 2024

scotts commented Oct 16, 2024

ahmadsharif1 left a comment

Choose a reason for hiding this comment

ahmadsharif1 Oct 16, 2024

Choose a reason for hiding this comment

scotts Oct 16, 2024

Choose a reason for hiding this comment

ahmadsharif1 Oct 16, 2024

Choose a reason for hiding this comment

ahmadsharif1 Oct 16, 2024

Choose a reason for hiding this comment

scotts commented Oct 16, 2024

scotts commented Oct 15, 2024 •

edited

Loading