Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

Merged
merged 10 commits into from
Jun 19, 2024
7 changes: 7 additions & 0 deletions cps/tasks/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,12 @@ def run(self, worker_thread):
if error:
log.error("[xklb] An error occurred while trying to download %s: %s", error[1], error[0])
self.message = f"{error[1]} failed to download: {error[0]}"
else:
log.error("No error found in the database, likely the video failed due to unavailable fragments.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal should be very complete diagnostic hints in both... logs and in "Tasks" view error messages:

@deldesir: what log.error message is best?

Suggested change
log.error("No error found in the database, likely the video failed due to unavailable fragments.")
log.error("%s failed to download: No path or error found in the database (likely the video failed due to unavailable fragments?)", self.media_url_link)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best, but the use of self.media_url_link here is not appropriate in a log message. Use self.media_url instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, go ahead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deldesir: please enact self.media_url if that's best, and test!

self.message = f"{self.media_url_link} failed to download: No path found in the database"
deldesir marked this conversation as resolved.
Show resolved Hide resolved
media_id = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one record with this webpath guaranteed to exist in the media table, even when the query at line 89 returned no records?

Since the query above has the extra condition ... AND path NOT LIKE 'http%', it's possible that this less restrictive query might return one result (or more?).

If the query fails, trying to access field [0] will throw, and you may need to add an extra check:

results = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()
if results:
  media_id = results[0]
  conn.execute("DELETE ...")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your point well. I am confident one record with this webpath will always be present because it's in fact "the" trigger of the actual download task. It's mandatory. However the condition AND path NOT LIKE 'http%' might not be met under certain circumstances, i.e when the video fails to have it path updated with a path.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed how the media table is created and updated by xklb, and it seems that a record with webpath = media_url will be inserted by the lb dl child process at some point...

Could there be cases where the child process fails before updating the database?

If it's possible (even if rare), then this query will return 0 records, and this code should defensively check before trying to access the result.

Copy link
Collaborator Author

@deldesir deldesir Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running of the child process alone attests the existence of webpath = media_url. Yes there are cases where lb dl can fail, for example live videos, videos without an available/suitable format, videos stuck due to unavailable fragments or network issues. But it will never return 0 record because it's "metadata fetch" that creates the record exists and ensures this specific record is used per

xklb_full_cmd="lb dl ${XKLB_DB_FILE} --video --search ${URL_OR_SEARCH_TERM} ${FORMAT_OPTIONS} --write-thumbnail --subs -o ${OUTTMPL} ${VERBOSITY}"
(notice the --search option)

conn.execute("DELETE FROM media WHERE webpath = ?", (self.media_url,))
conn.execute("DELETE FROM captions WHERE media_id = ?", (media_id,))
return
except sqlite3.Error as db_error:
log.error("An error occurred while trying to connect to the database: %s", db_error)
Expand All @@ -111,6 +117,7 @@ def run(self, worker_thread):
# 2024-02-17: Dedup Design Evolving... https://github.com/iiab/calibre-web/pull/125
conn.execute("UPDATE media SET path = ? WHERE webpath = ?", (new_video_path, self.media_url))
conn.execute("UPDATE media SET webpath = ? WHERE path = ?", (f"{self.media_url}&timestamp={int(datetime.now().timestamp())}", new_video_path))
conn.commit()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this too: commit() is required if the sqlite3 connection is in autocommit=False mode, which is the recommended value.

But it seems that autocommit =False is not the current default:
https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.autocommit

No change required here, but it's probably best to set autocommit=False on all sqlite connections? Otherwise, a crash happening between these two UPDATE statements would leave the database in an inconsistent state.

I also noticed that the conn.close() at line 127 can be removed:

with sqlite3.connect(XKLB_DB_FILE) as conn:
    ...transactions done here...
    # conn is implicitly closed when leaving the with block

conn.close()  # this additional close has no effect and can be removed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change required here, but it's probably best to set autocommit=False on all sqlite connections?

FYI, sqlite3.connect(... autocommit =False) requires Python 3.12, so it's probably best to avoid it while there are still users on older Python versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs on how sqlite3.Connection objects interact with context managers, I realize that it doesn't work the same of file.open() and other simple I/O objects.

with sqlite3.connect(XKLB_DB_FILE) as conn:
    # ^-- the above context manager will implicitly start an SQL transaction
    ...queries executed here are part of the transaction...
    # The transaction is committed, equivalent to `conn.commit()`

conn.close()  # Still needed because the with block leaves the connection open

See: https://docs.python.org/3/library/sqlite3.html#how-to-use-the-connection-context-manager

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: you don't need to add conn.commit() when you're already inside a with block and you will exit the block cleanly (e.g. without throwing an exception or crashing).

self.progress = 1.0
else:
log.error("Failed to send the requested file to %s", self.original_url)
Expand Down