-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use timezone-aware objects [mandate UTC in low-level vars for code safety, using Python 3.2's datetime.now(timezone.utc)
] [should Python 3.3's time.monotonic()
be used to approximate elapsed time?]
#217
base: master
Are you sure you want to change the base?
Changes from 3 commits
7fd516c
fd79ef0
9f52798
6c9e200
c97f550
2a12fbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
import re | ||
import requests | ||
import sqlite3 | ||
from datetime import datetime | ||
from datetime import datetime, timezone | ||
from flask_babel import lazy_gettext as N_, gettext as _ | ||
|
||
from cps.constants import XKLB_DB_FILE, MAX_VIDEOS_PER_DOWNLOAD | ||
|
@@ -23,7 +23,7 @@ def __init__(self, task_message, media_url, original_url, current_user_name): | |
self.original_url = self._format_original_url(original_url) | ||
self.is_playlist = None | ||
self.current_user_name = current_user_name | ||
self.start_time = self.end_time = datetime.now() | ||
self.start_time = self.end_time = datetime.now(timezone.utc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is already being done in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @deldesir do you agree? (Do revise if so!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the reason I made the previous change to consider the time set in the constructor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@deldesir Plz clarify the exact line number to help me understand? Thx! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
self.stat = STAT_WAITING | ||
self.progress = 0 | ||
self.columns = None | ||
|
@@ -110,7 +110,7 @@ def _update_metadata(self, requested_urls): | |
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if "shorts" not in url and url not in failed_urls} | ||
|
||
def _calculate_views_per_day(self, requested_urls, conn): | ||
now = datetime.now() | ||
now = datetime.now(timezone.utc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The advice of using Both >>> import time
>>> time.monotonic()
180162.46153954
>>> time.time()
1720317895.1978724 Thus, subtracting a monotonic clock from a system clock is a serious bug. |
||
for requested_url in requested_urls.keys(): | ||
try: | ||
view_count = conn.execute("SELECT view_count FROM media WHERE path = ?", (requested_url,)).fetchone()[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On line 117, what's the type of the time_uploaded field in the database? Also, datestamp.utcfromtimestamp() creates a naive datestamp, so if this is really a UNIX epoch, we should write: time_uploaded_raw = conn.execute("...").fetchone()[0]
time_uploaded = datetime.fromtimestamp(time_uploaded_raw, timezone.utc) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found the code writing conn.execute("UPDATE playlists SET path = ? WHERE path = ?", (f"{self.media_url}×tamp={int(datetime.now().timestamp())}", self.media_url)) Since Note that most SQL databases have types like TIMESTAMP and DATETIME, but SQLite by convention uses integer POSIX timestamps. So we're already storing the correct number here, but we could directly use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
@@ -151,7 +151,7 @@ def _add_download_tasks_to_worker(self, requested_urls): | |
def run(self, worker_thread): | ||
self.worker_thread = worker_thread | ||
log.info("Starting to fetch metadata for URL: %s", self.media_url) | ||
self.start_time = self.end_time = datetime.now() | ||
self.end_time = datetime.now(timezone.utc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously this was also setting start_time. No longer needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @deldesir take a look, Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here I made this change to make the elapsed time start immediately when the task is called instead of when the worker thread is started. Not significant though, but more realistic IMHO. |
||
self.stat = STAT_STARTED | ||
self.progress = 0 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are start_time and end_time exclusively used to compute the elapsed time in seconds?
If so, you don't need to use datetime objects, which are complex classes meant to represent calendar dates and localized time.
Internally, Linux and Windows represent time in (fractions of) seconds since a certain epoch. This system clock is the source for datetime objects, but you can also use it directly whenever you want to compute the delta:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important caveat from the docs of time.time():
This problem also affects
datetime.datetime.now(datetime.timezone.utc)
which you're using now, because the time can jump due to NTP, daylight saving or the system administrator simply setting the clock.A more robust way to compute time differences is using
time.monotonic()
, which exists since Python 3.3:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful knowledge: thanks @codewiz.
Maybe everyone sleep on this overnight to consider whether
datetime.now(timezone.utc)
IS-OR-IS-NOT appropriate for these Prevent [upcoming and truly live YouTube] videos from carrying over [to] subsequent download tasks [& discussion of UTC for code safety — avoid TZ code complexity wherever possible!?] #212 use cases ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deldesir above is a good reminder that mandating UTC is a "step forward" — but possibly might not be quite enough... ?
time.monotonic()
(or similar) better for some of these uses cases?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. They're mainly to render the localized task status based on the runtime per https://github.com/iiab/calibre-web/blob/master/cps/tasks_status.py
I agree elapsed time computation can and must be simplified, however we're dealing with a codebase in which mainly naive datetime objects are used. Just lookup
datetime.nowutc
and see for yourself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also be aware of a 4th way IIAB system clock can change (go forward but not backwards!) rather automatically:
(e.g. when someone browses IIAB from a nearby phone / mobile device in a school!)
Summary of above PR from May 2019:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're all aware of the counterargument of course:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand why "elapsed time" calculations became the central point of this discussion. This PR does only one thing, convert naive datetime objects into timezone-aware datetime objects, nothing else. The "elapsed time" calculations were always there and we can see their usefulness in the "tasks" page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deeper Question: what should be the goal of this PR?
If the answer is safer time handling, that might be achieved in ~5 main ways:
RECAP: So far this PR transitions to using the local machine's best guess at UTC time — which helps solve 1. and 2.
(But debatably this PR might not be quite enough to address considerations 3., 4. and 5. ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick RECAP of @codewiz's suggestion from ~2 days ago:
Python 3.3's
time.monotonic()
should at least be considered for the "Run Time" column of "Tasks" view — and any other situations where "elapsed time" (or similar) are important... ?cat /proc/uptime
uptime