Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download "top 100 videos" from YouTube channel or playlist, sorted by views-per-day #139

Closed
wants to merge 9 commits into from

Conversation

deldesir
Copy link
Collaborator

@deldesir deldesir commented Mar 11, 2024

🚀 Pull Request Overview:

This PR adds views-per-day to the metadata so videos can be sorted automatically based on the playlist. If the playlist contains more than 100, a bookshelf with the top 100 videos (based on their views per day) is created. For playlists with less than 100 videos, the sorting still happens automatically.

📋 Checklist:

image

📌 Testing scenarios:
See Issue #97

cc @EMG70

@deldesir deldesir requested a review from holta March 11, 2024 15:02
@deldesir deldesir self-assigned this Mar 11, 2024
@holta
Copy link
Member

holta commented Mar 11, 2024

  1. Great Progress!!

videos can be sorted automatically

  1. Are videos auto-sorted for users by views-per-day (highest-to-lowest) as soon as they look at the bookshelf? (Or not yet?)

Top 50 videos bookshelves are created for playlists with more than 50 videos and top 100 for ones with more than 100.

  1. That's too confusing, for new users especially.

    Please replace this "step function" (steps at 10, 50, 100) with one fixed number (e.g. 100) that is easily changeable in a config file (to 10 or 50 or whatever) by reading instructions at https://github.com/iiab/calibre-web/wiki

    Let's strongly consider a universal default of 100 (for starters, aiming to assist educators and parents etc) ensuring everything is very easy to explain and understand — with a cautionary warning for implementers so they know ABOUT how long the channel/playlist might take to download — into their Calibre-Web bookshelf. ⏱️

  2. Obviously Quality is what truly matters, not Quantity — but getting there is extremely hard. In the interim, if many communities require "Top 1GB" or "Top 10GB" subsets then that's one (of several) options that will be considered later.

@deldesir deldesir marked this pull request as draft March 12, 2024 17:44
time_uploaded = datetime.utcfromtimestamp(time_uploaded)
now = datetime.now()
# calculate views per day
days_since_publish = (now - time_uploaded).days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we force days_since_publish to be 1 at minimum (i.e. 1 day) to avoid divide-by-zero errors?

@holta
Copy link
Member

holta commented Mar 12, 2024

NUMBER_OF_VIDEOS = 100

@deldesir please use a vividly descriptive variable/constant name. Possibly something like:

  • MAX_VIDEOS_PER_DOWNLOAD

Likewise in future, we might need to consider:

  • MAX_GB_PER_DOWNLOAD

@holta holta changed the title Sort videos by views per day Download "top 100 videos" from YouTube channel or playlist, sorted by views-per-day Mar 12, 2024
@holta
Copy link
Member

holta commented Mar 12, 2024

@deldesir thanks for clarifying:

  • Does this PR work with YouTube channels yet?

  • Is every video (mentioned by every playlist in a YouTube channel!) ranked by views-per-day?

    1. If yes, does this mean that the "Top 100 Videos" derived from a channel... sometimes include videos not actually created/owned/uploaded by that channel?

    2. How hard would it be to eliminate such "external" videos arising from other channels / other YouTube creators etc? (e.g. if implementer communities later decide that eliminating "external to channel" videos is important.)

@deldesir deldesir marked this pull request as ready for review March 13, 2024 00:15
@deldesir
Copy link
Collaborator Author

deldesir commented Mar 13, 2024

  1. Are videos auto-sorted for users by views-per-day (highest-to-lowest) as soon as they look at the bookshelf? (Or not yet?)

Yes, this is done automatically.

  1. That's too confusing, for new users especially.
    Please replace this "step function" (steps at 10, 50, 100) with one fixed number (e.g. 100) that is easily changeable in a config file (to 10 or 50 or whatever) by reading instructions at https://github.com/iiab/calibre-web/wiki
    Let's strongly consider a universal default of 100 (for starters, aiming to assist educators and parents etc) ensuring everything is very easy to explain and understand — with a cautionary warning for implementers so they know ABOUT how long the channel/playlist might take to download — into their Calibre-Web bookshelf. ⏱️

Done. Please see https://github.com/iiab/calibre-web/wiki/Home/_compare/8376933a218f164fa18f3cf378ce491501dd94e7...2222461e9ca347c017c0cf68173d4b61e3c02cc5

  1. Obviously Quality is what truly matters, not Quantity — but getting there is extremely hard. In the interim, if many communities require "Top 1GB" or "Top 10GB" subsets then that's one (of several) options that will be considered later.

I agree. For this we'll need to know the approximate size of the videos before downloading them.

@deldesir
Copy link
Collaborator Author

deldesir commented Mar 18, 2024

Does this PR work with YouTube channels yet?
Is every video (mentioned by every playlist in a YouTube channel!) ranked by views-per-day?

This PR doesn't work with YouTube channels yet. I am working on it.

  1. If yes, does this mean that the "Top 100 Videos" derived from a channel... sometimes include videos not actually created/owned/uploaded by that channel?

N/A

  1. How hard would it be to eliminate such "external" videos arising from other channels / other YouTube creators etc? (e.g. if implementer communities later decide that eliminating "external to channel" videos is important.)

To eliminate "external" videos, I'll need to filter all videos based on their "uploader" ids. We should be covered once I am done supporting channels.

@holta
Copy link
Member

holta commented Mar 18, 2024

To eliminate "external" videos, I'll need to filter all videos based on their "uploader" ids. We should be covered once I am done supporting channels.

Thanks for explaining:

We can also defer that, as videos that originate "external to the channel" are generally a rare/corner case — that individual IIAB implementers can (and probably should!) manually/quickly solve on their own.

@holta
Copy link
Member

holta commented Mar 19, 2024

@deldesir what should @EMG70 be testing to confirm this PR is solid-or-close?

Also:

  • What follow-up is most urgent in other (small!) PRs?
  • e.g. What's needed to get "Download to IIAB" working well with YouTube channels-not-just-playlists?
  • What other key workflows / aspects need to become more bulletproof?

Thanks!

Useful when other users are downloading videos or testing with lb-wrapper...
@deldesir
Copy link
Collaborator Author

deldesir commented Mar 19, 2024

@holta, downloading top videos from channels now works. There is currently an issue with shorts being duplicated in xklb-metadata.db after their metadata has been updated. I work around this for now by filtering them. Hopefully, @chapmanjacobd can help resolve this so they can be included in top videos lists.

@EMG70, please test this PR following the instructions found in https://github.com/iiab/calibre-web/wiki#setting-the-maximum-number-of-videos-to-download. You can test with playlists, playlists of playlists, or channels.

--extra option make key metadata available without needing to download
@holta holta mentioned this pull request Mar 22, 2024
1 task
@holta holta added the enhancement New feature or request label Mar 22, 2024
@EMG70
Copy link

EMG70 commented Mar 25, 2024

SUDO IIAB-DIAGNOSTICS - http://sprunge.us/mqtZDA?en
Just checking if all is OK,after running PR#139 .No video is downloading at all.Could Blondel be working on it at the same time?
Screenshot from 2024-03-25 15-18-24

@holta
Copy link
Member

holta commented Mar 25, 2024

Thanks @EMG70 for the valuable feedback:

As this PR (or similar!) needs to be tightened up with solid testing, and merged quickly after ~2 weeks now 👍

@deldesir
Copy link
Collaborator Author

deldesir commented Mar 25, 2024

SUDO IIAB-DIAGNOSTICS - http://sprunge.us/mqtZDA?en
Just checking if all is OK,after running PR#139 .No video is downloading at all.Could Blondel be working on it at the same time?
Screenshot from 2024-03-25 15-18-24

I am investigating... I'll be back with an explanation.

@deldesir
Copy link
Collaborator Author

deldesir commented Mar 26, 2024

Thanks @EMG70 for pointing out the issue. I don't understand yet what background limitations causing this, but I just fixed it by structuring the code better. This PR will be closed and remade in to smaller ones for clarity. Thanks for your patience.

image

@deldesir deldesir closed this Mar 26, 2024
@holta
Copy link
Member

holta commented Mar 26, 2024

Is the issue related to playlists and channels with many videos?

e.g. @EMG70's playlist with 151 videos?

In comparison with playlists containing <= 50 videos?

@deldesir
Copy link
Collaborator Author

deldesir commented Mar 26, 2024

The issue is tied to the latest commit I made to align with xklb v2.5.018. This commit ensured the database was fully updated in a single lb tubeadd run. When I reverted the code to the state where basic metadata is fetched first before updating to retrieve the key ones, top 100 videos were extracted successfully. However, I had to adjust the code to handle an "unavailable video" error for one of the playlists @EMG70 tested with (https://www.youtube.com/playlist?list=PLjxrf2q8roU23XGwz3Km7sQZFTdB996iG)

Regarding the Ed's failed test, I suspect the issue is indeed related to playlists and channels with many videos. It worked for a small playlist of 18 videos (with MAX_VIDEOS_PER_DOWNLOAD = 3).

@holta
Copy link
Member

holta commented Mar 26, 2024

Thanks for explaining.

Please lay out different kinds of playlists/channels (or different kinds of sanity-tests?) that we can all help with — either revising or replacing these testing scenarios:

@deldesir
Copy link
Collaborator Author

deldesir commented Mar 26, 2024

Done in #97 (comment)

@holta
Copy link
Member

holta commented Mar 27, 2024

@deldesir deldesir deleted the deldesir-patch-17 branch July 1, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants