Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not deleting media older than "days to keep" #206

Open
rrediske opened this issue Jan 19, 2022 · 21 comments · May be fixed by #598
Open

Not deleting media older than "days to keep" #206

rrediske opened this issue Jan 19, 2022 · 21 comments · May be fixed by #598

Comments

@rrediske
Copy link

First, thank you for this wonderful way to download videos automatically!

The problem I am having is that tubesync isn't removing videos from any of the four channels I have it watching, so I will eventually run out of disk space if I can't find a way to delete old videos. I used your basic 13 line docker-compose.yml, so I have nothing special for configuration. When adding each channel, I set the "download cap" to 1 week and the "days to keep" to 10 days, but tubesync still has every video it's ever downloaded going back to December 13th of last year (I'm now at 63 videos).

In a previous install on another machine, I tried deleting videos manually from the mounted volume "tubesync-downloads", but that seemed to cause tubesync to stop downloading anything else, so I wound up moving to a new machine to start over. It takes almost 2 full days for tubesync to index all the videos of the 4 channels I watch, so I really want to avoid having to do that.

Any ideas? Am I doing something wrong?

@meeb
Copy link
Owner

meeb commented Jan 19, 2022

Thanks for the comments! That could well be a bug. I'll leave this open and investigate. I'm assuming you're on normal-ish Linux and not running it with weird WSL paths on Windows or anything?

@rrediske
Copy link
Author

rrediske commented Jan 19, 2022

Open SuSE Leap 15.3, so... weird enough, but normal-ish :) I'm running a home assistant and a nextcloud docker image in the same machine and they have been fine for a few months. It's a VM on a Dell R710, running ESXi, and the VM has 8 GB RAM, 120 GB disk.

@meeb
Copy link
Owner

meeb commented Jan 20, 2022

Thanks for the details. If you can easily search the container logs are there any errors? If for some reason it's attempting to clear up files that don't exist due to an invalid path or similar issue there should be a log of it.

@rrediske
Copy link
Author

I did docker logs 8703 >& abc then did a grep ERROR on that and I got 39 errors in the last 24 hours of the form:

2022-01-19 19:46:49,576 [tubesync/ERROR] ERROR: [youtube] 7wLxM7oNN1s: This video is unavailable on this device.

39 sounds like the number of videos that should be getting deleted, might be a coincidental number, though.

Here's more context:

Rescheduling task Downloading metadata for "21e4dd9c-6e1e-4bff-ac85-df44d526a059" for 5:45:41 later at 2022-01-19 11:35:05.305812+00:00
2022-01-18 23:49:28,602 [tubesync/DEBUG] [youtube] 7wLxM7oNN1s: Downloading webpage
2022-01-18 23:49:29,099 [tubesync/DEBUG] [youtube] 7wLxM7oNN1s: Downloading android player API JSON
2022-01-18 23:49:29,312 [tubesync/ERROR] ERROR: [youtube] 7wLxM7oNN1s: This video is unavailable on this device.
Rescheduling Downloading metadata for "672d9be1-840c-49ed-825a-ae1e12a43fc4"
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/background_task/tasks.py", line 43, in bg_runner
    func(*args, **kwargs)
  File "/app/sync/tasks.py", line 227, in download_media_metadata
    metadata = media.index_metadata()
  File "/app/sync/models.py", line 1235, in index_metadata
    return indexer(self.url)
  File "/app/sync/youtube.py", line 50, in get_media_info
    raise YouTubeError(f'Failed to extract_info for "{url}": No metadata was '
sync.youtube.YouTubeError: Failed to extract_info for "https://www.youtube.com/watch?v=7wLxM7oNN1s": No metadata was returned by youtube-dl, check for error messages in the logs above. This task will be retried later with an exponential backoff.
Rescheduling task Downloading metadata for "672d9be1-840c-49ed-825a-ae1e12a43fc4" for 5:45:41 later at 2022-01-19 11:35:10.319907+00:00

@meeb
Copy link
Owner

meeb commented Jan 20, 2022

That's a "normal" error, that video ( https://www.youtube.com/watch?v=7wLxM7oNN1s ) does seem to be actually unavailable so TubeSync is correct there. Thanks, I'll check the old media cleanup code.

@rrediske
Copy link
Author

I could share my screen via something like Discord or Jitsi if that helps, type whatever so you can see output. That's the only ERROR line type. The log file abc is 24,700 lines long just for the last day, lol.

@meeb
Copy link
Owner

meeb commented Jan 20, 2022

Thanks for the offer, but it's probably not too useful right now. If there was an attempt to delete the wrong path in a cleanup it would have left a note in the error log. It should be relatively easy to trace why the clean-up isn't firing.

@rrediske
Copy link
Author

grep -i clean shows 13 lines with the word clean, but they're all in video titles.

@rrediske
Copy link
Author

I don't know if this helps, but if I hit skipped media, it shows 55 pages of 144 videos each, so that's almost 8000 videos for it to index. Maybe it's too many for it to handle? One channel by itself is somewhere around 5000 videos.

@meeb
Copy link
Owner

meeb commented Jan 28, 2022

That amount of media should be fine, it's likely an issue detecting if the file exists or some other sort of path issue. If the the media has been downloaded already and exists on your local disk it must have been indexed properly already, so the max-age deletion should pick it up.

@rrediske
Copy link
Author

rrediske commented Feb 1, 2022

I decided to remove some of the downloads manually to recover disk space yesterday around 10:30 AM server time. I did a docker exec into the container and then rm 2021*, rm 2022-01-0* and rm 2022-01-1* in each directory inside /downloads (removing anything older than 11 days, all the sources are configured in tubesync to remove items older than 10 days).

Here's the log file for the last two days: https://docs.rediske.org/2022-02-01.txt

@mcinj
Copy link

mcinj commented Feb 1, 2022

@rrediske, in the UI on the media tab, do your episodes that you expect to be deleted show a "downloading" text?

@rrediske
Copy link
Author

rrediske commented Feb 1, 2022

I'll have to give tubesync some time and look tomorrow, the oldest media there is dated 1/22/22, 10.5 days ago. Nothing shows downloading right now.

image

@rrediske
Copy link
Author

rrediske commented Feb 3, 2022

The dashboard no longer shows videos older than 1/23:

image

But the list of files still goes back to 1/20 (I manually deleted everything older than 1/19):

image

@MatthK
Copy link

MatthK commented Jan 26, 2023

I just discovered, that my TubeSync also fails to delete older content. I had it set for 10 days, but I still have all the content from the last half year.
I changed the days to keep it, but that didn't trigger a delete either. I went through the log file, but searching for "clean" only brought up hits in the name of the videos. There are a lot of lines with "error", however the few dozens I checked, where all related to videos it couldn't download.
I opened a terminal session and then tried a rm filename and the file got deleted immediately. So I would assume, it should not be an issue with file permissions (my download directory is on a mounted directory). And TubeSync also runs in a docker container. I also updated the container to the latest version this week, but that seem to not have fixed the issue.
It's not urgent and the disk has still remaining space, but should I just delete the old videos manually or keep them for "testing"?

@meeb
Copy link
Owner

meeb commented Jan 26, 2023

I'm not aware of this not functioning in the current release, however the logic might not be entirely clear. As per:

https://github.com/meeb/tubesync/blob/main/tubesync/sync/tasks.py#L134

The cleanup_old_media() function is called every time a source is indexed. Media is deleted with the following log message (which will be in the container logs):

                log.info(f'Deleting expired media: {media.source} / {media} '
                         f'(now older than {media.source.days_to_keep} days / '
                         f'download_date before {delta})')

The media deletion should be triggered if the following conditions are met:

  1. The media is downloaded
  2. The media download date is not null
  3. The source of the media has "delete old media" enabled
  4. The source of the media has "days to keep" set to an integer
  5. The media download date is older than the current date minus the days to keep

The clean-up code is relatively simple and I can't obviously see any issues with it. If you can confirm the above 5 prerequisites are met and your media still isn't being deleted let me know and I'll pop a bunch of debug logging into the tasks to work out what isn't firing on your installation.

@MatthK
Copy link

MatthK commented Jan 26, 2023

I checked the log again for "Deleting expired" and found the following entry (among others):

2023-01-25T01:38:33.709029102Z 2023-01-25 09:38:33,708 [tubesync/INFO] Deleting expired media: Formel 1 / wycpxkxIWk0 (now older than 14 days / download_date before 2023-01-11 01:38:33.708724+00:00)
2023-01-25T01:38:33.709975984Z 2023-01-25 09:38:33,709 [tubesync/INFO] Deleting tasks for media: Als Ayrton Senna fast für Ferrari gefahren wäre!
2023-01-25T01:38:33.712439282Z 2023-01-25 09:38:33,712 [tubesync/INFO] Scheduling media server updates
2023-01-25T01:38:33.716321202Z 2023-01-25 09:38:33,716 [tubesync/INFO] Deleting expired media: Formel 1 / b3QtosB64Jg (now older than 14 days / download_date before 2023-01-11 01:38:33.716211+00:00)
2023-01-25T01:38:33.717626928Z 2023-01-25 09:38:33,717 [tubesync/INFO] Deleting tasks for media: 10 F1-Rekorde, die "Schumi" 2023 verlieren könnte
2023-01-25T01:38:33.719752753Z 2023-01-25 09:38:33,719 [tubesync/INFO] Scheduling media server updates
...
2023-01-26T01:38:29.186077595Z 2023-01-26 09:38:29,185 [tubesync/INFO] Deleting completed tasks older than 7 days (run_at before 2023-01-19 01:38:29.185964+00:00)
2023-01-26T01:38:30.710721535Z 2023-01-26 09:38:30,710 [tubesync/INFO] Deleting expired media: Formel 1 / DjpAsYab0n0 (now older than 14 days / download_date before 2023-01-12 01:38:30.710454+00:00)
2023-01-26T01:38:30.711765598Z 2023-01-26 09:38:30,711 [tubesync/INFO] Deleting tasks for media: „Du kommst hier nicht rein“: warum die F1 Andretti blockiert!
2023-01-26T01:38:30.714079051Z 2023-01-26 09:38:30,713 [tubesync/INFO] Scheduling media server updates

Now while it seems that TubeSync is deleting the files, they still exist on the disk. When I "View media linked to this source" however, I can only see three episodes, while all previous ones appear now as "Skipped".

@onedayfishsale
Copy link

onedayfishsale commented Feb 9, 2023

I'm seeing this as well on 0.12.0. Running in Docker and using a host-mounted NFS share for /downloads.

tubesync  | 2023-02-08 08:30:13,091 [tubesync/INFO] Deleting expired media: Munro Live / Ehnjhj8WFG4 (now older than 14 days / download_date before 2023-01-25 13:30:13.091025+00:00)
$ ls *Ehnjhj8WFG4*
2023-01-24_munro-live_100-million-lines-of-code-the-state-of-automotive-software-ces-2023_Ehnjhj8WFG4_1080p-vp9-opus.info.json
2023-01-24_munro-live_100-million-lines-of-code-the-state-of-automotive-software-ces-2023_Ehnjhj8WFG4_1080p-vp9-opus.jpg
2023-01-24_munro-live_100-million-lines-of-code-the-state-of-automotive-software-ces-2023_Ehnjhj8WFG4_1080p-vp9-opus.mkv
2023-01-24_munro-live_100-million-lines-of-code-the-state-of-automotive-software-ces-2023_Ehnjhj8WFG4_1080p-vp9-opus.nfo
$ 

@eocx
Copy link

eocx commented Jun 17, 2023

I am facing the same issue in TubeSync version 0.12.1 running in a Docker container with media files located on the host system.

@meeb, you explained

As per:

https://github.com/meeb/tubesync/blob/main/tubesync/sync/tasks.py#L134

The cleanup_old_media() function is called every time a source is indexed. Media is deleted with the following log message (which will be in the container logs):

                log.info(f'Deleting expired media: {media.source} / {media} '
                         f'(now older than {media.source.days_to_keep} days / '
                         f'download_date before {delta})')

With commit 410906ad8eeec03c34723cda18eba21f8c742cab the media file deletion was removed from the media_pre_delete() function of tubesync/sync/signals.py. Since this change, the deletion of associated media files takes place at various functions defined in tubesync/sync/views.py for the user interactive scenarios.

Presumably, the file deletion should now also be done explicitly triggered by cleanup_old_media() for the thumbnail, media, NFO and JSON files?

@meeb
Copy link
Owner

meeb commented Jun 18, 2023

Yeah file deletion can probably be put back into signals now. If I recall some logic was moved because some users reported it was erroneously deleting media and causing issues and there are issues with the tasks firing reliably (which is a much larger ongoing issue with attempting to replace the entire tasks system).

@meeb
Copy link
Owner

meeb commented Aug 3, 2024

This should have been resolved for quite some time. I'll close this for now. Please create a new issue if you still experience this.

@tcely tcely linked a pull request Dec 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants