Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DupFileManager][Feature] Swap files based on codec #431

Open
un-hash opened this issue Sep 10, 2024 · 16 comments
Open

[DupFileManager][Feature] Swap files based on codec #431

un-hash opened this issue Sep 10, 2024 · 16 comments

Comments

@un-hash
Copy link

un-hash commented Sep 10, 2024

@David-Maisonave thank you for the handy plugin.
Do you plan to integrate the ability to swap files based on codec or even bitrate?
Would be awesome.

@un-hash un-hash changed the title [Feature] Swap files based on codec [DupFileManager][Feature] Swap files based on codec Sep 10, 2024
@David-Maisonave
Copy link
Contributor

David-Maisonave commented Sep 11, 2024

Do you plan to integrate the ability to swap files based on codec or even bitrate?

I can do that, but to be honest, I'm not that knowledgeable on what is a better codec.
I can do a google search on the topic, but I like to get your input as to what you would consider a better codec.

bit_rate looks like that's an easy option to add, because the higher the number, the better the video. The same thing can be said for frame_rate.

But the video_codec is not that simplistic. It doesn't list a number, and instead it list a type. I need to find a good source showing the ranking for video_codec, so the code can determine which codec is better.

In my stash library I have over 30000 video files, and I just did a SQL query to get all the unique values for the video_codec.
select DISTINCT video_codec from video_files
The above SQL query gave me the following results:

h264
wmv3
av1
mpeg4
wmv2
mpeg1video
msmpeg4v3
vp6f
msmpeg4v2
wmv1
msmpeg4v1
flv1
mpeg2video
h263i
vc1
png
gif
vp9

I would have to rank the above list, and check if there are other codec's that are not in my collection.
This link video-codecs, seems to say that h264 is better than h265. Even though the link is dated for 2024, it doesn't list h263i, which is a codec in my video collection.

If you're familiar with codecs and know their ranking, that would help.

@DogmaDragon
Copy link
Contributor

There isn't a real way to determine which source file is better based on video codec or bitrate.

Video codec matters to determine if your device can play it, which will vary from different hardware to different browser choice.

Bitrate, while an important quality measure in totality, doesn't really help with you with determining quality on its own. It will be determined by file size which is affected by video and audio codecs, resolution, etc.

@David-Maisonave
Copy link
Contributor

There isn't a real way to determine which source file is better based on video codec or bitrate.

I especially agree with respect to the video codec.
But I did a google search to try to get a consensus, and if I had to rank the codec, I would rank it in the following order:

h266
vp9
av1
h265
h264
h263
h263i
vp8
vp6f
mpeg4
msmpeg4v3
msmpeg4v2
msmpeg4v1
vc-1
vc1
AVC
mpeg2
mpeg2video
wmv3
wmv2
wmv1
mpeg1
mpeg1video
flv1  (not a video codec)
png
gif

What I could do is include this in the DupFileManager_config.py, where the user can both enable codec ranking and modify the desired ranking order.
Example:

    # If enabled, favor videos with better codec according to codecRanking
    "favorCodecRanking" : False,
    # Codec Ranking in order of preference
    "codecRanking" : ["h266", "vp9", "av1", "h265", "h264", "h263", "h263i", "vp8", "vp6f", "mpeg4", "msmpeg4v3", "msmpeg4v2", "msmpeg4v1", "vc-1", "vc1", "AVC", "mpeg2", "mpeg2video", "wmv3", "wmv2", "wmv1", "mpeg1", "mpeg1video", "flv1", "png", "gif" ],
    # If enabled, favor videos with higher value bit rate
    "favorBitRate" : False,
    # If enabled, favor videos with higher value frame rate
    "favorFrameRate" : False,

@DogmaDragon
Copy link
Contributor

The ranking is flawed by design. It prioritizes maximum potential efficiency, which is not only situational and vary per usage type, but each decoder have their own quality settings when encoding the video which simple codec type reading doesn't take into account.

@David-Maisonave
Copy link
Contributor

David-Maisonave commented Sep 11, 2024

The ranking is flawed by design. It prioritizes maximum potential efficiency,

That's why I implemented the code in the configuration file, where the user can decide what is most important, and what codec is preferential.
Also, this feature will be disabled by default. The user can enable it if they want this type of comparisons.

FYI: I like the way you phrased it, and I'm putting that in the comment in the config file to state the default order.

    # Codec Ranking in order of preference (default is order of ranking based on maximum potential efficiency)
    "codecRanking" : ["h266", "vp9", "av1", "h265", "h264", "h263", "h263i", "vp8", "vp6f", "mpeg4", "msmpeg4v3", "msmpeg4v2", "msmpeg4v1", "vc-1", "vc1", "AVC", "mpeg2", "mpeg2video", "wmv3", "wmv2", "wmv1", "mpeg1", "mpeg1video", "flv1", "png", "gif" ],

@un-hash
Copy link
Author

un-hash commented Sep 11, 2024

Yea agree, the codec discussion can be opinionated or sometimes even esotheric.
I think that every user has their own preferences. For me for example I prefer h265 over h264. because all my tools support h265. If you however don't want to live transcode and your devices dont support h265, you probably prefer h264. Compatibility is perhabs the sole reason h264 even still exists.

So yea, to be able to configure your codec ranking is in my opinion the best solution. Maybe to make it easier for the user is to group them? Somethin like High efficiency group: h265, Vp9, h266, av1... Compatibility group:h264, vp8, mpeg4...

Same for bit rate. best solution here I guess would be to give two options: Prefer fidelity or efficency. So you would first decide which codec and then within codecs you rank by bit rate. (Given the files are the same resolution).

Another approach could be similar to what Sonarr/Whisparr does: Filesize/Duration...

@David-Maisonave
Copy link
Contributor

Same for bit rate. best solution here I guess would be to give two options: Prefer fidelity or efficency. So you would first decide which codec and then within codecs you rank by bit rate.

I'm not sure I understand. Are you saying to have option to favor lower bit_rate over a higher bit_rate?

@un-hash
Copy link
Author

un-hash commented Sep 11, 2024

yea. It would be just be a nice to have tho. I personally prefer higher bit rate. But I would guess there are users that need to watch their file sizes or exclusively watch on phones etc...

@stg-annon
Copy link
Contributor

compression will affect different videos in different ways, actually comparing the input and the output would be the best way to compare. The only "objective" way I know of would be with something like VMAF but requires the unaltered source to compare against.

Most people will just care about compatibility for codec so H.254 may be more important, as for quality I personally like to use bitrate per pixel to get a rough estimate of quality across resolutions.

Best option is to provide a customization and a reasonable default which is kind of the whole premise behind phashDuplicateTagger

@David-Maisonave
Copy link
Contributor

David-Maisonave commented Sep 11, 2024

Best option is to provide a customization and a reasonable default which is kind of the whole premise behind phashDuplicateTagger

Interesting plugin. It's similar to DupFileManager. I like the compare_bitrate_per_pixel function.

Why is generate_phash needed? Why would PHASHs be missing in some scenes?

FYI:
CODEC_PRIORITY in config_example.py has less than half the codec I found in my Stash library.

@David-Maisonave
Copy link
Contributor

Best option is to provide a customization and a reasonable default which is kind of the whole premise behind phashDuplicateTagger

The phashDuplicateTagger plugin would be hard to use for a non-programmer. You want customization that the average user can configure without having to know python.
The plugin has the config.py file with the wrong name, so it fails to run unless the user renames config_examples.py to config.py.
Some users won't even know to look in the log file to see why the plugin fails to run.

@stg-annon
Copy link
Contributor

Why is generate_phash needed? Why would PHASHs be missing in some scenes?
It can be if the scene has corruption or the generate phash flag was not selected on scan.

CODEC_PRIORITY is not intended to be comprehensive it's just to prioritize specific codecs that the user might want

config.py is intentionally untracked as the plugin manager would override any customization the user provides from any updates, it seems there is no exception to tell the user to rename the config specifically, most of the things I make are for myself and people just ask me to share them

@David-Maisonave
Copy link
Contributor

Why is generate_phash needed? Why would PHASHs be missing in some scenes?
It can be if the scene has corruption or the generate phash flag was not selected on scan.

CODEC_PRIORITY is not intended to be comprehensive it's just to prioritize specific codecs that the user might want

Thanks for the reply, but I think you grab the wrong quotes from my comments, or I don't understand the reply association with my question.
I'm trying to figure out in what context is the following code needed.

stash.metadata_generate({"phashes": True})

The phashDuplicateTagger task associated with this states the following: Generate PHASHs for all scenes where they are missing
Why would some scenes have PHASH and other have it missing?

@stg-annon
Copy link
Contributor

stg-annon commented Sep 12, 2024

Why would some scenes have PHASH and other have it missing?

Generating a PHASH for each scene is not a requirement for Stash, you can have situations where scenes don't have a PHASH either by configuration or most likely when ffmpeg cannot open the file or errors for some reason during the generate process

I seem to remember the task was a request and the plugin primally uses PHASHes to detect dupes so it was added as a shortcut for those that don't know how to run the generate task

@David-Maisonave
Copy link
Contributor

un-hash,

The swap codec logic has been added to Axter-Stash->DupFileManager.
It'll get added to CommunityScripts->DupFileManager after further testing.

Can you please close this item.

@DogmaDragon
Copy link
Contributor

I'll close the issue once it's added to this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants