A simple tool to detect if there is a signature in an image or a PDF file.
It's the quick way to use this tool.
signature-detect
package contains the codes in the src
.
pip install signature-detect
It's the recommended way to explore this tool. It provides notebooks for playing around.
-
install anaconda
-
install dependencies
conda create --name <env> --file conda.txt
-
Image:
python demo.py --file my-image.jpeg
-
PDF File:
python demo.py --file my-file.pdf
All the codes in src
are covered.
cd tests
coverage run -m unittest
coverage report -m
We use the following image as an example. The full example is in the demo notebook
The loader reads the file and creates a mask.
The mask is a numpy array. The bright parts are set to 255, the rest is set to 0. It contains ONLY these 2 numbers.
-
low_threshold = (0, 0, 250)
-
high_threshold = (255, 255, 255)
They control the creation of the mask, used in the function cv.inRange
.
Here, yellow is 255
, purple is 0
.
The extractor, first, generates the regions from the mask.
Then, it removes the small and the big regions because the signature is neither too big nor too small.
The process is as followed.
-
label the image
skimage.measure.label
labels the connected regions of an integer array. It returns a labeled array, where all connected regions are assigned the same integer value. -
calculate the average size of regions
Here, the size means the number of the pixels in a region.
We accumulate the number of the pixels in all the regions,
total_pixels
. The average size istotal_pixels / nb_regions
.If the size of a region is smaller
min_area_size
, this region is ignored.min_area_size
is given by the user. -
calculate the size of the small outlier
small_size_outlier = average * outlier_weight + outlier_bias
outlier_weight
andoutlier_bias
are given by the user. -
calculate the size of the big outlier
big_size_outlier = small_size_outlier * amplfier
amplfier
is given by the user. -
remove the small and big outliers
-
outlier_weight = 3
-
outlier_bias = 100
-
amplfier = 10
15
is used in the demo. -
min_area_size = 10
The cropper finds the contours of regions in the labeled masks and crop them.
Suppose (h, w) = region.shape
.
-
min_region_size = 10000
If
h * w < min_region_size
, then this region is ignored. -
border_ratio: float
border = min(h, w) * border_ratio
The border will be removed if this attribute is not
0
.
The judger reads the cropped mask and identifies if it's a signature or not.
Suppose (h, w) = cropped_mask.shape
.
-
size_ratio: [low, high]
low < max(h, w) / min(h, w) < high.
-
max_pixel_ratio: [low, high]
low < the number of 0 / the number of 255 < high.
The mask should only have 2 value, 0 and 255.
By default:
-
size_ratio = [1, 4]
-
max_pixel_ratio = [0.01, 1]
-
max(h, w) / min(h, w)
= 3.48 -
number of
0
/ number of255
= 0.44
So, this image is signed.