This general-purpose integer factorization software is released into the public domain. It is provided "as is" without any warranty, express or implied.
- is immediately compatible with Microsoft Windows and Linux.
- is thread-safe and simply has no dependencies, not even
math.h
. - use Pollard's Rho algorithm for factoring 64-bit integers.
- comes with a Big Number library for handling large numbers.
- includes a Self-Initializing Quadratic Sieve (SIQS).
- use AVL trees to assist the Quadratic Sieve in data management.
- is capable of batch factor numbers with JSON or CSV output.
- has been tested and produces reproducible results.
- reads numbers to be factored from either a file or the command line.
- performs preliminary checks (e.g., trial division, perfect power) before invoking the algorithms.
- verifies the factorization (the product of factors should be equal to the input number).
- returns a
0
exit code when all numbers are fully factored, non-zero otherwise.
The factor.exe
file is available for Windows users. Otherwise, to get an executable, follow these steps:
- Ubuntu: Install a C compiler with the command:
sudo apt install build-essential
. - Windows: Install MinGW, which will provide a GCC compiler similar to Ubuntu.
Once you have the compiler, use Terminal on Ubuntu or PowerShell on Windows to navigate to the downloaded directory using the cd
command:
- Compile the program by executing:
gcc -Wall -pedantic -O3 main.c -o factor
. GCC will create the appropriate executable for your device.
Compilation should take just a few seconds, and then you can use the software:
- Run the executable with a command like:
./factor 2840222005804112554242019945562782034485263657432907
The software will quickly display the factors:
Number: 2840222005804112554242019945562782034485263657432907
Factors: 186565459987 (prime), 123384476380949634131^2
These options are designed to facilitate your interaction with the software:
Name | Short | Goal | Example |
---|---|---|---|
--input-file |
-i |
Factor the entire file. | -i input.txt |
--output-file |
-o |
Write results to a file. | -o output.txt |
--timeout |
-t |
Interrupt the Quadratic Sieve after a specified duration in seconds. | -t 1 |
--force |
-f |
Override the default limits: 8191 digits for a number and 220-bit for the Quadratic Sieve. | -f |
--verbose |
-v |
Show more or fewer details, such as Quadratic Sieve progress, depending on the verbosity level. The default value is 1. | • -v 0 • -v 1 • -v 2 • -v 3 • -v 4 |
--output-csv |
-c |
Format results as CSV. | -c |
--output-json |
-j |
Format results as JSON. | -j |
--output-json-compact |
-J |
Same as --output-json , but disable the pretty print. |
-J |
--output-sql |
-sql |
Format results as SQL. | -sql |
--rand-seed |
-r |
Specify a custom XorShift random seed (64-bit). | -r 123 |
--generate |
-g |
Designed for testing, this option create a generated.txt suitable as input file, containing non-trivial sample numbers. Its parameters, all optional, are -g <bits-min> <bits-max> <count> . |
• -g • -g 150 • -g 130 150 • -g 150 150 1000 |
--help |
-h |
Print the help. | -h |
You can enforce parameters to the Quadratic Sieve :
Quadratic Sieve option | Typical value range |
---|---|
--qs-multiplier <value> |
from 1 to 150 |
--qs-base-size <value> |
from 1000 to 50000 |
--qs-large-prime <value> |
from 300000 to 30000000 |
--qs-alloc-mb <value> |
from 1 to a few hundred |
--qs-sieve <value> |
31744 by default |
--qs-threshold <value> |
from 60 to 110 |
--qs-error-bits <value> |
from 15 to 35 |
--qs-laziness <value> |
from 80 to 120 (in percentage) |
Navigate through the source code to see their default value and usage.
The software is designed to be a general-purpose factorizer for integers, also supporting negative numbers. The algorithm can factor large numbers (e.g., a 1500-bit highly composite number or 10,000!), but performance can vary significantly depending on the structure of the number. For every additional 3 decimal digits (or 10 bits) to the input number, the Quadratic Sieve duration roughly doubles.
-
Default target performance
- The software can factor 170-bit RSA integers in a second.
- The software can factor 220-bit RSA integers in approximately 30 seconds.
-
Performance with basic tuning (sometimes without tuning)
- Successfully factors RSA numbers greater than 250 bits.
- Factored a 300-bit RSA number in 2 hours.
- Factored the 321-bit RSA number related to the bank card case.
- Factored RSA-100, a 330-bit number.
You can use the built-in generator to easily produce various reproducible samples (consistent across all platforms) in a generated.txt
file. These examples are for Ubuntu:
Example of a testing command that factor 10 numbers of 150-bit based on the seed 1
, taking a few seconds:
./factor --generate 150 150 10 --rand-seed 1
./factor -i generated.txt -c -v 2 && echo "All the numbers have been fully factored."
- You can use
--rand-seed 2
to get another set of sample numbers. - In Ubuntu, use
--rand-seed $(date +%s%N)
to systematically get another set of sample numbers. - In Windows the
&&
operator isn't always supported, and the factorizer often has a.exe
extension. - When the software returns a
0
exit code, it means that all numbers have been fully factored.
Example of a verbose testing command that factor 1189625923578446633052664796204789022410184682730580575561447
, a 200-bit number in 15 seconds :
./factor --generate 200
./factor -i generated.txt -v 4 && echo "The number have been fully factored."
Example of a testing command that factor sample numbers ranging from 120-bit to 170-bit, taking 20 seconds:
./factor --generate 120 170
./factor -i generated.txt -v 2 && echo "All the numbers have been fully factored."
Example of a testing command that shows in a minute that the Quadratic Sieve selects a good multiplier (better than 1 meaning no multiplier), which speeds up the average execution time by a factor 2:
./factor --generate 170 170 10
./factor -v 2 -c -i generated.txt
./factor -v 2 -c -i generated.txt --qs-multiplier 1
Example of the testing command that generated the provided .csv
and .json
samples of output files. To give you an idea of the environment, it took 11 minutes per file on the machine that tested the software:
./factor --generate
./factor --input-file generated.txt --output-file example-result.csv --output-csv
./factor --input-file generated.txt --output-file example-result.json --output-json
Example of a quick testing command based on the OEIS list of highly composite numbers:
wget -qO- https://oeis.org/A002182/b002182.txt | cut -d' ' -f 2 > oeis-list.txt
./factor -i oeis-list.txt -c && echo "All the numbers have been fully factored."
- it reads options from the command line.
- it prepares a session capable of handling the largest number.
- it recursively factors each number using the available algorithms.
- it reuses the session to factor all numbers.
Before starting, the input integers are parsed; if an error is detected, the factorization will not begin, and you will be notified via the standard error output. The software commit the output after each factorization by clearing buffers.
- The manager allocates a few memory (a session) based on the largest number.
- The Quadratic Sieve manages its memory independently, based on its input.
- Memory used by the Quadratic Sieve is freed as soon as it has factored its input.
- The manager reuse its session, the memory is feeed once all numbers are factored.
- Valgrind was used to check for memory leaks, no leaks were found.
Memory allocations are relatively small (typically a few dozen megabytes) and are passed to assert
. In case of memory allocation errors (e.g., insufficient available memory), the software will exit, and it is recommended to restart your device.
The cint
library (released "as is", into the public domain, without any warranty, express or implied) is provided for handling large integers. It includes a variety of basic and advanced mathematical functions to support your calculations.
The avl
functions (released "as is", into the public domain, without any warranty, express or implied) manage thousands of integers in AVL trees, making Quadratic Sieve relations storage and retrieval efficient and straightforward.
The Block Lanczos algorithm (released "as is", into the public domain, without any warranty, express or implied) solve a large, sparse system of linear equations, and identify dependencies among the rows of the matrix formed by the relations identified by the Quadratic Sieve.
Your software's source code is organized into the following files:
Filename | Purpose |
---|---|
64-bits-factorization.c |
Contains functions for factoring classical integers, including Pollard's Rho algorithm and a Miller-Rabin primality checker. |
avl-trees.c |
Contains the AVL Trees functions, originally a separate project from the integer factorization project. |
basic-math.c |
Includes a prime number generator capable of yielding the first 326,983 prime numbers, along with the Kronecker symbol function, Tonelli-Shanks function, and modular inverse function. |
big-num.c |
Contains the cint library, also initially a separate project from the integer factorization project. |
block-lanczos.c |
Provides the linear algebra part of the Quadratic Sieve, including a matrix reducer and a manager for this task. |
headers.h |
Contains definitions for types, structures, functions, and DEBUG macros used throughout the project. |
main.c |
Contains the program's entry point and transfers control to the factorization manager after about 30 lines. |
manager.c |
Includes I/O functions, help documentation, shortcuts commonly used by the Quadratic Sieve, and the factorization manager source code. |
quadratic-sieve.c |
Contains the Quadratic Sieve implementation, divided into around thirty functions, with a main function that manages the factorization process in about 30 lines. Search for "debug" in this file to find the debugging elements displayed. |
The software :
- uses a stateful approach and is thread-safe (no global variables are used).
- uses reproductible XorShift random values that depend on the
--rand-seed
(default to0
). - produces reproducible results regardless of the order in which the factorizations are batched.
The Quadratic Sieve file structure is as follows:
- the ~40 lines function that allows to see the algorithm structure
- the 2 important loop conditions
- the algorithm parameters
- the functions approximately in the order they are called
The input N is duplicated and called "kN" after this function complete :
- multiply N by a prime number to reach 120 bits
- apply a multiplier to N intended to optimize runtime
The preparation_part_3_default_proposition
function provide a multiplier that make the algorithm completes faster on average. A preparation_part_3_alternative_proposition
is also provided for the same purpose but is unused.
Variable | Information |
---|---|
N | Prime factors are removed from N, and N is updated until N = 1 . |
kN | The algorithm computes with kN, which remains constant. |
See of how multipliers impact the execution of ./factor 10146246660331135529849066552891483667878618078493758287
:
Option used | Makes the algorithm factoring | Comment on the multiplier | Took |
---|---|---|---|
--qs-multiplier 7 |
kN = 7 * N |
Selected by the algorithm | 3.1 s |
--qs-multiplier 1 |
kN = 1 * N |
No multiplier | 6.3 s |
--qs-multiplier 113 |
kN = 113 * N |
Large multipliers are sometimes good | 8.2 s |
--qs-multiplier 29 |
kN = 29 * N |
One of the worst | 19.4 s |
--qs-multiplier 35 |
kN = 35 * N |
One of the worst | 26.3 s |
While the proposed method doesn't always select the absolute best multiplier, it finds a balance that improves average runtime by a factor 2. The best multiplier can be up to 10 times faster than the worst. Can you guess which multiplier achieves this ? For those interested, the Knuth-Schroeppel analysis explores why choosing the right multiplier makes such a significant difference (Learn more).
Good parameters can improve the speed.
- define the algorithm parameters
- allocates a block of memory for the quadratic sieve computations
- prepare constants, variables, buffers, data arrays ...
There is a struct inside the qs_sheet called mem :
- it holds the base entry point of the allocated memory
- it holds a now void* pointer which represent the current available memory
qs_sheet holds 3 AVL tree manager :
- one to store the regular relations
- one to store the relations that wait to be paired (partials)
- one to store the known divisors of N
With small precautions you are supposed to be able to store anything in now, then to update now accordingly to what you stored. now is always supposed to contain only zero until its end. This is the main memory management technique used by this software. mem_align
aims to provide aligned pointers, wasting a few bits if necessary.
Fill the manager's base array with prime numbers provided by a constant expression :
- verify that kN is a square mod prime, ignore it otherwise
- associates the prime with square root of kN mod prime
- associates the prime with its size (log2)
- computes invariants like D used to generate the A polynomial coefficient
- can restore the relations previously saved by Lanczos algorithm
- can be used to perform analysis, save/restore factorization to file or other periodic actions
coefficient | is a constant after | is a constant until |
---|---|---|
A | iteration_part_1 |
inner_continuation_condition completion |
B | iteration_part_4 |
for loop final expression |
C | iteration_part_6 |
for loop final expression |
- Polynomial and related data are computed until
iteration_part_6
complete - The algorithm prepares data that will help generate polynomial values that will be multiples of the factor base
iteration_part_7
anditeration_part_8
are used for sieving
register_relations
searches sieve for relations, it reads the sieve. When the function finds interesting to define the variable X , calculates the value of the polynomial in X then divides the value with the factor base, it thus tries to establish relations.
Buffered knowledge is structured by register_regular_relation
and register_partial_relation
:
- informations potentially useful are saved into a struct qs_relation*
register_regular_relation
immediately build a matrix of regular full qs_relation* using AVL treeregister_partial_relation
combine (single large primes variation) qs_relation* together using AVL tree
The AVL trees are used to identify duplicates and retrieve their associated data.
Decides if sieving shoud continue or break, the relation counter is usually the condition.
This algorithm find a subset of all exponent vectors such that the sum of their exponent vectors is the zero vector :
- the process need memory, block-lanczos.c have its own array builder
- all memory taken in mem.now is zeroed and reusable after calculations
- finding the matrix eigenvalues is usually fast, it can still take 400+ iterations with large N
- the algorithm is probabilistic with high success rate, generally it is lucky enough
- several tries with and without reduce_matrix are performed before giving up
The reduce_matrix
function helps lanczos_block
and could be called unconditionally.
The finalization_part_1
function performs the square root + GCD step based on the lanczos_block
answer, it leads to a factorization of N.
The qs_register_divisor
function :
- perform checks (such as primality, perfect power and GCD between known divisors)
- register the appropriate divisors (prime factors of N) to the caller's routine
This function is used to decide if the algorithm should return the control, or return to sieving.
- normal case is to return the control, answer was found
- unusual case is when N isn't fully factored (maybe parameters was wrong)
So before giving up, the algorithm searches 10%, 25% then 50% more relations.
In short: Given two partial relations with the same large prime, the register_partial_relation
function merges them to obtain a full relation.
This step is optional for completing a factorization but can significantly speed up the process. To disable this feature, simply avoid calling the register_partial_relation
function.
During the execution of register_relations
, after trial division by the factor base primes (cint_remove
), if the quotient remains greater than 1 but is less than the large prime bound, the software registers this partial relation with register_partial_relation
. This function stores the new qs_relation
in a separate location from full relations.
When the large prime collected matches a prime from another partial relation (detected via the AVL tree), the software combines these two relations and then calls the register_regular_relation
function, which registers all regular full relations. At this point, the newly created relation is treated like any other full relation.
N bits | Relations added immediately | Relations added by single large primes variation | Relations needs |
---|---|---|---|
150 | 2551 | 3 | 2554 |
170 | 3168 | 659 | 3827 |
190 | 3212 | 1786 | 4998 |
210 | 3409 | 3012 | 6421 |
230 | 5659 | 4686 | 10345 |
250 | 10051 | 9793 | 19844 |
270 | 14026 | 15484 | 29510 |
290 | 12820 | 20201 | 33021 |
- Dario Alpern's Integer Factorization Calculator: This tool processes the factorization directly in your browser.
- Number Empire Factoring Calculator: This tool performs factorization on the server.
There are many people to thank :
- Carl Pomerance
- Jason Papadopoulos
- William Hart
- my professors at University of Franche-Comté ❤️
- GitHub and SourceForge users reporting issues
You can report any issues, and we'll investigate to correct the source files.