Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command-line utility for duplicate image finding and processing #1520

Open
porridge opened this issue Sep 13, 2024 · 9 comments · May be fixed by #1524
Open

Command-line utility for duplicate image finding and processing #1520

porridge opened this issue Sep 13, 2024 · 9 comments · May be fixed by #1524
Labels

Comments

@porridge
Copy link

Setup (please complete the following information):

  • Geeqie version [geeqie --version]: irrelevanat

What is your question

Would you be open to shipping an additional binary that reuses the code in similar.cc for comparing images, but is meant for automated or semi-automated workflows to find and eliminate duplicate images?

Additional context

I find the duplicate image logic in geeqie to be of excellent quality. 🤩

However the GUI interface is not convenient/flexible enough for my use case. I was looking for a tool such as findimagedupes which makes it possible to plug in code that can decide which files in a set to delete, however I found its comparison logic to be less than stellar.

After dusting off my C++ knowledge (and commenting out a few lines in similar.cc), I managed to come up with an alternative frontend to your code, that - although a bit naive/inefficient - works exactly as I imagined and started saving me literally hours of time per year.

This code currently weighs 155 lines of C++, and 50 or so lines of example Python code to automate the decision of whether to delete files or ask for confirmation.

I would like to share it with others, and I can either:

  1. start my own project and copy similar.* with minimal modifications over,
  2. ask you to add it to geeqie 😊

I'm asking this preliminary question first, rather than send a PR straight away, because I would need to spend some more time to properly do option processing before sending a well-formed PR. I'm happy to take a stab at this, but would rather avoid doing so if the answer is "no" anyway 😄

@caclark
Copy link
Collaborator

caclark commented Sep 13, 2024

From my point view, adding extra command line options is not a problem. If people are going to use such a feature it would be better for them to do it via Geeqie rather than via another project.

Although, if implemented, there will be a configuration option to disable it, I am not sure about the implications of making python a dependency. Perhaps @gusnan has an opinion.

The only python dependency in the project so far is involved with running tests, so is not part of the distribution.

@porridge
Copy link
Author

Sorry, I should have pointed out that the Python part is completely optional. Basically how the main program works is that it runs some command once for each group of duplicates (passing them as arguments). By default the command is just echo but you can specify anything you like.

I just happened to use Python to create a script that accepts the filenames as parameters, and if their names match certain patterns (in my case this hints on their source) deletes smaller files without asking for confirmation.

@porridge
Copy link
Author

FTR, here is the standalone version. I'll try to come up with a PR against geeqie soon. I understand you'd prefer to make this a mode of geeqie itself, rather than a seperate executable? 🤔

@caclark
Copy link
Collaborator

caclark commented Sep 14, 2024

. I understand you'd prefer to make this a mode of geeqie itself, rather than a seperate executable?

It is your choice if you want a separate project/executable. I think it would be useful if it were available within Geeqie.

main.cc and remote.cc are probably the only files affected.

Please note CODING.md, but ignore the bits that everyone else ignores.
When you make a pull request to Geeqie on GitHub, static tests are automatically run.
You can run the tests locally first by ./scripts/test-all.sh

The Help file/User Manual will need an update. I can do that.

[I have noticed that very occasionally the similarity algorithm gets it completely wrong. I am not sure I would use an automatic delete without a visual check. But everyone has a choice....]

@porridge
Copy link
Author

. I understand you'd prefer to make this a mode of geeqie itself, rather than a seperate executable?

It is your choice if you want a separate project/executable. I think it would be useful if it were available within Geeqie.

I agree about integrating it in the Geeqie project. I was asking about whether it should be an option within /usr/bin/geeqie or a separate executable (e.g. /usr/bin/geeqie-duplicate-finder or such)?

@caclark
Copy link
Collaborator

caclark commented Sep 14, 2024

Ah, sorry.
I was assuming an additional option:

geeqie --remote --duplicates-program=<prog name> --duplicates-threshold=<n> --duplicates-files=<file list>
geeqie -r -p <prog name> -t <n> -m <file list>

Maybe the -m option is not necessary - just take whatever file list is appended on the command line.

Geeqie now has bash command line completion which makes things easier with long-options.

[I am working on making Geeqie a GtkAppliction, which will make command line processing easier. The --remote option will disappear]

Keeping everything together makes distribution easier.

@virtadpt
Copy link

@caclark I was really hoping it would be implemented this way.

@porridge
Copy link
Author

Maybe the -m option is not necessary - just take whatever file list is appended on the command line.

I having trouble figuring out how to get access to this "remainder" file list from within the callback I added for --duplicates-program 🤔
Can you maybe provide some hints?

@caclark
Copy link
Collaborator

caclark commented Sep 23, 2024

The command line processing is a mess...

The attached diff might help you to get something to work in some way or other, but it is not really suitable to put in the repo.

Run ./build/src/geeqie from one terminal window, and from a second terminal window run:
./build/src/geeqie -r <some file list>
./build/src/geeqie -r -m

I will work on making the command line processing more logical.

remote-duplicates-1.diff.gz

@porridge porridge linked a pull request Sep 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants