Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor seeking to only store pts as int64 timestamp #262

Closed
wants to merge 2 commits into from

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Oct 15, 2024

Refactor that:

  1. Replaces std::optional<double> maybeDesiredPts_ in VideoDecoder with:
    a. bool hasDesiredPts_ in VideoDecoder; and
    b. setting each stream info's discardFramesBeforePts directly in VideoDecoder::setCursorPtsInSeconds(double second).
  2. Adds VideoDecoder::setCursorPts(int64_t pts) which we can call when we have an int-based pts that came directly from FFmpeg. This is a private API, meant only to be used when we're round-tripping pts values from FFmpeg. We don't expose it to users.

Note the FIXME comment. It seems like we're "consuming" the values of discardFramesBeforePts in maybeSeekToBeforeDesiredPts(). We reset hasDesiredPts_ after calling it (and we previously reset maybeDesiredPts_). It seems like we should also reset all of the discardFramesBeforePts values.

Thinking about it more, I'm also unsure if we need discardFramesBeforePts to be an optional. Because we initialize it to be 0 and we never "clear" the value out, that implies two things:

  1. 0 is a reasonable default, and when there is a reasonable default, it's probably not worth having an optional value.
  2. The value is always present in our current code.

If 0 is not a reasonable default, then we can just use the alternative case when the optional is not present in getNextRawDecodedOutputNoDemux(): INT64_MIN.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2024
@scotts scotts marked this pull request as ready for review October 16, 2024 03:56
@ahmadsharif1
Copy link
Contributor

Can you run benchmarks to ensure there are no regressions?

Also explain why this change is being done in the first place, and whether it is intended to keep existing behavior or change it. Will there be any scenarios where behavior is changed?

@scotts
Copy link
Contributor Author

scotts commented Oct 16, 2024

@ahmadsharif1, sure I'll get some benchmark numbers.

As a pure refactor, we're keeping existing behavior - hence no new tests. Any behavior change would be a bug. And good question on the motivation. There are two main motivations:

  1. We introduce the potential for precision loss when we convert and store timestamp values as doubles. This is inevitable for when we want users to provide us seconds as doubles. But there are several instances where we start with a timestamp we got directly from FFmpeg as an int, and we want to seek to that position. Converting that to a double introduces the possibility of precision loss, and is confusing as we're always going to compare it against FFmpeg's int-based timestamps. We should instead store timestamps internally in the same format FFmpeg does, as ints. We then only receive timestamps as doubles from users. This enables us to only use int-based timestamps in some internal cases.
  2. When digging into how to do the above, I discovered we have some redundant and unnecessary code. We should clean that up.

Copy link
Contributor

@ahmadsharif1 ahmadsharif1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a random seek test before changing this code because seeking is hard to get right in terms of correctness and performance.

The test should seek forwards/backwards in a loop and grab the next frame and make sure it's at or higher than the seeked value.

It can also add streams in the middle.

We should also test seeking to beginning/end to make sure edge cases are handled correctly.

The original code is as lazy as it can be -- meaning the conversion from user's double to per-stream int64 happens as late as possible. That's actually more robust because user can seek first and add stream later. It's also more efficient.

Unless there is a compelling behavior change, I am inclined to keep the original behavior.

hasDesiredPts_ = true;
}

void VideoDecoder::setCursorPts(int64_t pts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this function makes sense without all streams having a common timebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's internal, and we always know a stream index when calling it, I think it's fine if this takes a stream index.

@@ -1179,7 +1178,18 @@ VideoDecoder::DecodedOutput VideoDecoder::getNextDecodedOutputNoDemux() {
}

void VideoDecoder::setCursorPtsInSeconds(double seconds) {
maybeDesiredPts_ = seconds;
for (int streamIndex : activeStreamIndices_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the original change because its lazier -- when the user seeks a bunch of times only the last seek value should affect anything. The new change sets things eagerly

More importantly, if the user adds a stream after calling seek it doesn't seek that stream, so it's less robust too

std::optional<double> maybeDesiredPts_;
// True when the user wants to seek. The actual pts values to seek to are
// stored in the per-stream metadata in discardFramesBeforePts.
bool hasDesiredPts_ = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking is hard to implement correctly and I am not confident about this change.

I think at least for debugging purposes we should keep the original double value around and when returning a frame after a seek we should ensure it's >= the value passed in here

@scotts
Copy link
Contributor Author

scotts commented Oct 16, 2024

@ahmadsharif1, the laziness of the old behavior is a good point - I'm not sure how to retain that with my current approach. I'm going to drop the bigger part of the refactor, but I want to address discardFramesBeforePts not needing to be optional. I'm going to put up a separate PR for it.

@scotts scotts closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants