An app clusters a given set of images and displays results via a simple JavaFX GUI. First, Perceptual Hashing is used to map the images to binary feature vectors. Then Agglomerative Hierarchical Clustering with Hamming distance as a distance measure is used to group similar binary vectors.
Note: we use a low hard-coded cutHight
value of 8.0
in order to cut the dendrogram tree into small clusters with
low number of outliers. You might experiment with different values of cutHeight
in the
HCluster
depending on your dataset size and required 'quality' of the clustering.
Build the project with sbt assembly
. This will generate a phash-hierarchical-clustering-assembly-<version>.jar
uberjar
file in the target/scala-<scalaVersion>
subdirectory (where <version>
is the current version defined in build.sbt
).
Run the application from the .jar
with the java -jar
command, e.g.:
java -jar target/scala-2.12/phash-hierarchical-clustering-assembly-1.0.jar <imageDirectory>
this might take a while the 1st time, since the app needs to compute the phash value for every image in the<imageDirectory>
<imageDirectory>
is the folder where the images are stored (use as many images as possible for better results).
-
Sample clusters from a dataset consisting of 5K images with Apple logo
-
A dendrogram illustrate the result of Hierarchical Clustering used with
complete
agglomeration method (see Smile docs for more details)