Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

deldesir · 2024-06-18T15:34:43Z

Video failed due to unavailable fragments are not kept in the database. Those failed for other reasons are left but will not carry over the next download session.

Tested on Ubuntu 24.04 with #194

holta · 2024-06-18T17:12:57Z

Possibly related?

@EMG70's YouTube playlist downloading problems 2024-06-10: (1) (This is taking longer than expected) (2) Metadata Fetch: [YouTube video URL] failed: 'NoneType' object cannot be interpreted as an integer [@avni results: (3) failed to download: None (4) it keeps trying to redownload failed videos] #178
Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] [hundreds of videos stuck in xklb-metadata.db, painstakingly processed every time, but never downloading?] #189
Overall "Download to IIAB" workflow #190

holta · 2024-06-18T18:53:40Z

@codewiz can you review?

holta · 2024-06-18T18:59:36Z

@deldesir can you recommend any YouTube URL (video or playlist) known to regularly pollute /library/calibre-web/xklb-metadata.db ?

(So we can all use this test case URL to confirm improved behavior!)

deldesir · 2024-06-18T19:25:18Z

Pollution and "carrying over" are 2 different things. The former happens when a downloading session is cancelled prematurely. For example, if you click on "cancel" button in the video row (Tasks page) or if you restart Calibre-Web while a playlist download is in progress. The latter happens in the case of failures due to unavailable fragments or because the video is/was a live.

To answer your question, just restart Calibre-Web or click on cancel before the download task kicks in.

holta · 2024-06-18T19:37:00Z

Thanks @deldesir for explaining!

Ready to merge (PRs #193 and #194) if @codewiz agrees code is clean / safe / readable ✅

codewiz · 2024-06-18T22:01:34Z

cps/tasks/download.py

+                            else:
+                                log.error("No error found in the database, likely the video failed due to unavailable fragments.")
+                                self.message = f"{self.media_url_link} failed to download: No path found in the database"
+                                media_id = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()[0]


Is one record with this webpath guaranteed to exist in the media table, even when the query at line 89 returned no records?

Since the query above has the extra condition ... AND path NOT LIKE 'http%', it's possible that this less restrictive query might return one result (or more?).

If the query fails, trying to access field [0] will throw, and you may need to add an extra check:

results = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone() if results: media_id = results[0] conn.execute("DELETE ...")

I am not sure I understand your point well. I am confident one record with this webpath will always be present because it's in fact "the" trigger of the actual download task. It's mandatory. However the condition AND path NOT LIKE 'http%' might not be met under certain circumstances, i.e when the video fails to have it path updated with a path.

I reviewed how the media table is created and updated by xklb, and it seems that a record with webpath = media_url will be inserted by the lb dl child process at some point...

Could there be cases where the child process fails before updating the database?

If it's possible (even if rare), then this query will return 0 records, and this code should defensively check before trying to access the result.

The running of the child process alone attests the existence of webpath = media_url. Yes there are cases where lb dl can fail, for example live videos, videos without an available/suitable format, videos stuck due to unavailable fragments or network issues. But it will never return 0 record because it's "metadata fetch" that creates the record exists and ensures this specific record is used per

calibre-web/scripts/lb-wrapper

Line 84 in b854132

xklb_full_cmd="lb dl ${XKLB_DB_FILE} --video --search ${URL_OR_SEARCH_TERM} ${FORMAT_OPTIONS} --write-thumbnail --subs -o ${OUTTMPL} ${VERBOSITY}"

(notice the --search option)

codewiz · 2024-06-18T23:59:57Z

cps/tasks/download.py

@@ -111,6 +118,7 @@ def run(self, worker_thread):
                        # 2024-02-17: Dedup Design Evolving... https://github.com/iiab/calibre-web/pull/125
                        conn.execute("UPDATE media SET path = ? WHERE webpath = ?", (new_video_path, self.media_url))
                        conn.execute("UPDATE media SET webpath = ? WHERE path = ?", (f"{self.media_url}&timestamp={int(datetime.now().timestamp())}", new_video_path))
+                        conn.commit()


I verified this too: commit() is required if the sqlite3 connection is in autocommit=False mode, which is the recommended value.

But it seems that autocommit =False is not the current default:
https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.autocommit

No change required here, but it's probably best to set autocommit=False on all sqlite connections? Otherwise, a crash happening between these two UPDATE statements would leave the database in an inconsistent state.

I also noticed that the conn.close() at line 127 can be removed:

with sqlite3.connect(XKLB_DB_FILE) as conn: ...transactions done here... # conn is implicitly closed when leaving the with block conn.close() # this additional close has no effect and can be removed

No change required here, but it's probably best to set autocommit=False on all sqlite connections?

FYI, sqlite3.connect(... autocommit =False) requires Python 3.12, so it's probably best to avoid it while there are still users on older Python versions.

After reading the docs on how sqlite3.Connection objects interact with context managers, I realize that it doesn't work the same of file.open() and other simple I/O objects.

with sqlite3.connect(XKLB_DB_FILE) as conn: # ^-- the above context manager will implicitly start an SQL transaction ...queries executed here are part of the transaction... # The transaction is committed, equivalent to `conn.commit()` conn.close() # Still needed because the with block leaves the connection open

See: https://docs.python.org/3/library/sqlite3.html#how-to-use-the-connection-context-manager

In other words: you don't need to add conn.commit() when you're already inside a with block and you will exit the block cleanly (e.g. without throwing an exception or crashing).

codewiz · 2024-06-19T02:42:24Z

All review issues addressed, merge at your convenience.

holta · 2024-06-19T04:31:23Z

Let's try to add a bit more "Defensive Programming" to download.py before merging:

Hopefully @deldesir has time Wednesday morning early, to explain/suggest options here, also on a call if possible.
My personal opinion is that we should programmatically enforce assumptions (preconditions) — try hard to provide pertinent error messages and logging — even when (especially when!) lb-wrapper, xklb, xklb-metadata.db, yt-dlp (ETC) happen to be broken! 💔 🚒

cps/tasks/download.py

Co-authored-by: A Holt <[email protected]>

holta · 2024-06-19T16:29:45Z

cps/tasks/download.py

@@ -95,6 +95,12 @@ def run(self, worker_thread):
                            if error:
                                log.error("[xklb] An error occurred while trying to download %s: %s", error[1], error[0])
                                self.message = f"{error[1]} failed to download: {error[0]}"
+                            else:
+                                log.error("No error found in the database, likely the video failed due to unavailable fragments.")


The goal should be very complete diagnostic hints in both... logs and in "Tasks" view error messages:

@deldesir: what log.error message is best?

Suggested change

log.error("No error found in the database, likely the video failed due to unavailable fragments.")

log.error("%s failed to download: No path or error found in the database (likely the video failed due to unavailable fragments?)", self.media_url_link)

Best, but the use of self.media_url_link here is not appropriate in a log message. Use self.media_url instead.

Sounds good, go ahead.

Thanks @deldesir: please enact self.media_url if that's best, and test!

holta · 2024-06-19T19:23:38Z

@deldesir: Tested as safe enough to merge?

deldesir · 2024-06-19T19:46:41Z

Yes, tested on Ubuntu 24.04

Prevent stuck videos from carrying over next download session

4731503

deldesir self-assigned this Jun 18, 2024

deldesir marked this pull request as draft June 18, 2024 15:34

Commit changes made in database

106bc84

holta added bug Something isn't working enhancement New feature or request labels Jun 18, 2024

deldesir added 2 commits June 18, 2024 13:30

Ensure the right id is used to remove failed video captions

6bf33f1

Delete only if no error found

0f9d435

deldesir mentioned this pull request Jun 18, 2024

Adjust message to warn about problematic videos [e.g. "live" videos, that might not really be live anymore, but still fail to download] #194

Merged

deldesir marked this pull request as ready for review June 18, 2024 18:47

deldesir requested a review from holta June 18, 2024 18:47

deldesir mentioned this pull request Jun 18, 2024

Ensure error and live_status columns are created from the start #191

Merged

codewiz reviewed Jun 18, 2024

View reviewed changes

deldesir added 3 commits June 18, 2024 20:35

Remove unnecessary close connection

2992049

Close the database connection

fefdbff

Remove unnecessary connection commit

59e671e

holta reviewed Jun 19, 2024

View reviewed changes

cps/tasks/download.py Outdated Show resolved Hide resolved

holta changed the title ~~Prevent stuck videos from carrying over next download session~~ Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] Jun 19, 2024

Enhance status message

5d604e6

Co-authored-by: A Holt <[email protected]>

holta reviewed Jun 19, 2024

View reviewed changes

deldesir added 2 commits June 19, 2024 14:37

Enhance log message

1855e33

Remove one more unnecessary commit

b482073

holta merged commit ad9c02a into iiab:master Jun 19, 2024

deldesir deleted the deldesir-patch-46 branch July 1, 2024 21:40

holta mentioned this pull request Jul 1, 2024

(1) download.py: "sqlite3.OperationalError: no such column: error" after "TypeError: 'NoneType' object is not subscriptable" (2) editbooks.py: SAWarning: Object of type <Books> not in session, add operation along 'Authors.books' won't proceed #186

Open

holta mentioned this pull request Aug 26, 2024

Prevent [upcoming and truly live YouTube] videos from carrying over [to] subsequent download tasks [& discussion of UTC for code safety — avoid TZ code complexity wherever possible!?] #212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

deldesir commented Jun 18, 2024 •

edited

Loading

holta commented Jun 18, 2024

holta commented Jun 18, 2024

holta commented Jun 18, 2024

deldesir commented Jun 18, 2024 •

edited

Loading

holta commented Jun 18, 2024

codewiz Jun 18, 2024

deldesir Jun 18, 2024

codewiz Jun 18, 2024

deldesir Jun 19, 2024 •

edited

Loading

codewiz Jun 18, 2024

codewiz Jun 19, 2024

codewiz Jun 19, 2024

codewiz Jun 19, 2024

codewiz commented Jun 19, 2024

holta commented Jun 19, 2024 •

edited

Loading

holta Jun 19, 2024

deldesir Jun 19, 2024

codewiz Jun 19, 2024

holta Jun 19, 2024

holta commented Jun 19, 2024

deldesir commented Jun 19, 2024 •

edited

Loading

	log.error("No error found in the database, likely the video failed due to unavailable fragments.")
	log.error("%s failed to download: No path or error found in the database (likely the video failed due to unavailable fragments?)", self.media_url_link)

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

Conversation

deldesir commented Jun 18, 2024 • edited Loading

holta commented Jun 18, 2024

holta commented Jun 18, 2024

holta commented Jun 18, 2024

deldesir commented Jun 18, 2024 • edited Loading

holta commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deldesir Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codewiz commented Jun 19, 2024

holta commented Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holta commented Jun 19, 2024

deldesir commented Jun 19, 2024 • edited Loading

deldesir commented Jun 18, 2024 •

edited

Loading

deldesir commented Jun 18, 2024 •

edited

Loading

deldesir Jun 19, 2024 •

edited

Loading

holta commented Jun 19, 2024 •

edited

Loading

deldesir commented Jun 19, 2024 •

edited

Loading