GenId offers few algorithms to generate mostly non-conflicting and time-ordered IDs (mostly for databases/workflows) without a central coordinator.
A tiny library making it easy to generate most of the UUID flavors zoo.
At the lower level it provides a facility to easy combine user defined bit-fields with different semantics (e.g. random numbers, machine id, timestamp etc.) inside Integers. Combining them allows to construct specific UUID generators/parsers in just few lines of code. It also implements widely used (de-)serialization schemes.
Finally it offers example implementations for the following specific UUID schemes used in the industry:
- 128-bit UUIDv7 proposal/variant with machine id;
- 128-bit Nano ID using modified Base 64 text encoding;
- 128-bit ULID using Crockford Base 32;
- 128-bit XID using modified Base 64 text encoding;
- 128-bit Firebase Push ID using modified Base 64 text encoding;
- 64-bit Snowflake ID using Crockford Base 32;
- 64-bit Instagram ID using Crockford Base 32;
In distributed and/or IOT systems, the latency for acquiring unique IDs (e.g. for primary/technical keys, sequences in databases, queue middleware etc.) between different nodes/threads and a single coordinator (database/service etc.) is sometimes higher than desirable. In such contexts Universally Unique IDentifiers (UUIDs) can be used as they offer some uniqueness guarantees across number of machines/threads without round-trips to a central authority.
Different flavors of UUIDs have different trade-offs around performance, security, number of bytes used, uniqueness guarantees, (de-)serialization choices etc.
Julia currently offers implementations of UUID v1, v4 and v5 (see UUIDs in the Standard Library). While these provide industry standard algorithms and representations of the IDs (see RFC 4122), they are not always ideal for usage in databases as they can introduce unwanted side effects like index fragmentation/write amplification or require some configuration of the clients generating them in advance.
There are number of new UUID proposals (see New UUID Formats) which try to address under different trade-offs some of these shortcomings. Below are few examples:
- Brief history of UUIDs;
- The best UUID type for database keys;
- The primary key dillema: IDs vs UUIDs and some practical solutions;
- How to not use TSID factories
As well about some security constraints/implications:
- Support for 64- and 128-bit UUIDs;
- Small DSL for defining the structure & semantic of a UUID (no syntax/but data oriented through data definitions)
- Support for fields representing widely used UUID components like machine id, random number, timestamp etc.;
- Ability to declaratively combine them in a single Integer with custom offsets and bit lengths;
- Custom implementations of Base 16 (Hex), Base 32, Crockford Base 32 and Base 64 text encoding schemes to allow for phonetic sorting for UUIDs having a timestamp component (e.g. to use the IDs as keys in a database);
- The text encodings are produced/parsed back in big endian byte order;
- Allows to get back field values from UUIDs;
- Ability to use own text encoding dictionaries with declarative case sensitivity for the simple ones;
- Currently all UUID examples mask the highest bit, to allow transparent storage in databases which don't have large unsigned integer types.
Add the package to your project
julia> Pkg.add("GenId")
and import it.
using GenId, Dates
See https://en.wikipedia.org/wiki/Snowflake_ID
# SnowflakeIdDefinition(epoch_start_dt::DateTime, machine_id::Int64)
# 41 bits timestamp (ms), 10 bits machine id, 12 bits sequence numbers per machine
julia> iddef = SnowflakeIdDefinition(DateTime(2020, 1, 1, 0, 0, 0, 0), 1)
...
julia> iddef.name == :SnowflakeIdDefinition
true
julia> tsid_generate(iddef)
489485826766409729
julia> tsid_generate_string(iddef)
"DJR0RGDG0401"
# en-/decoded using Crockford Base 32
julia> tsid_to_string(iddef, 489485826766409729)
"DJR0RGDG0401"
julia> tsid_int_from_string(iddef, "DJR0RGDG0401")
489485826766409729
See https://github.com/arturictus/firebase_pushid
Uses modified Base 64 text encoding.
# FirebasePushIdDefinition()
# 48 bits timestamp (ms), 72 randomness
julia> iddef = FirebasePushIdDefinition()
...
julia> iddef.name == :FirebasePushIdDefinition
true
julia> tsid_generate(iddef)
301430602692632926610578560781911544
julia> tsid_generate_string(iddef)
"EWsj5l65EXH2G1Qfc0Nu"
julia> tsid_to_string(iddef, 301430602692632926610578560781911544)
"EWsj5l65EXH2G1Qfc0Nu"
julia> tsid_int_from_string(iddef, "EWsj5l65EXH2G1Qfc0Nu")
301430602692632926610578560781911544
Define an UUID structure
# The machine id for the current process unique in the infrastructure for the specific application (e.g. machine id in a cluster or VPC etc.).
# Can come from a configuration file, environment variable, last digits of an IP address etc.
machine_id = 1
# the actual UUID definition
iddef = TsIdDefinition(
# Data type used for storage of the ID
Int64;
# Number of bits used for the timestamp section.
bits_time=41,
# Number of bits used for the machine section.
bits_group_1=10,
# Number of bits for the tail section.
# Can be a random number or a local machine/thread specific sequence.
bits_tail=12,
# Increment tail bits globally (independent of thread ids) for the node (machine/server)
tail_algorithm=:machine_increment,
# Use group_1 as the machine_id
group_1=machine_id,
# Start of the epoch for this UUID scheme.
# Time before that can't be represented.
epoch_start_dt=DateTime(2020, 1, 1, 0, 0, 0, 0),
# End of the epoch for this UUID scheme.
# Time after that can't be represented.
epoch_end_dt=DateTime(2070, 12, 31, 23, 59, 59, 999)
)
# The sum of the desired bits must match the word size of the specified data type (e.g. 41+10+12=63, which (currently) are the number of bits used in a Int64 type).
# Start and end of the epoch of the ID must fit in the desired number of bits for time.
# or
iddef = SnowflakeIdDefinition(
# Start of the epoch for this UUID scheme.
SOME_EPOCH_START_2020,
# the machine_id
1
)
Generate an ID using the ID definition:
julia> tsid_generate(iddef)
489485826766409729
# the ID is produced in the integer type of desired size
julia> typeof(489485826766409729)
Int64
And convert it if necessary to a used friendly text:
julia> tsid_to_string(489485826766409729)
"DJR0RGDG0401"
Or use a lower level function to customize the output:
julia> crockford32_encode_int64(489485826766409729, with_checksum=true)
"DJR0RGDG04014"
Once you have an ID, you can extract back its components:
julia> tsid_timestamp(iddef, 489485826766409729)
2023-09-12T17:21:55.308
julia> typeof(tsid_timestamp(iddef, 489485826766409729))
DateTime
julia> tsid_machine_id(iddef, 489485826766409729)
1
julia> tsid_machine_tail(iddef, 489485826766409729)
1
One can also decode it from a text representation:
julia> tsid_from_string(iddef, "DJR0RGDG0401")
489485826766409729
julia> crockford32_decode_int64("DJR0RGDG0401")
489485826766409729
julia> crockford32_decode_int64("DJR0-RGDG-0401")
489485826766409729
julia> crockford32_decode_int64("DJR0-RGDG-0401-4", with_checksum=true)
489485826766409729
Used in production.
Few unpolished nuances around (un-)signed integers and (un-)signed fields at first position.
A design choice and not a necessity, between trade-offs at this moment.
A wrapper type would help distinguish between UUIDs and other integer in an application which is useful. Lack of a wrapper allows for transparent passing around UUIDs between an application and databases/drivers without explicit (de-)serialization, while errors around UUIDs used as keys are enought profound for a system, to discover them rather early then late. But, a wrapper type is planned, not just for higher type safety, but also for easier support of larger than UInt128 UUIDs (e.g. KSUID)
Stock Base 32/64 are not correctly sortable under standard ASCII or UTF variants (even under big endian schemes). The library uses encodings where at first come ASCII numbers, then capital letters, then small letters and finally punctuation characters, which allows for lexicographic sorting of encoded strings. E.g. in a standard (as per RFC 4648 and earlier RFC 3548), characters in the encoding table of the Base 64 encoding are ordered like "ABCDE....abcde...01223...+/", while we use "0123...ABCD...abcd...+-", which is in line with integer codes in ASCII/UTF variants.
- More human readable and less error prone to dictation than some others (e.g. Base32, Base64, Base58 etc.), while still compressing a bit over Hex encoding for example (each character in Crockford Base 32 corresponds to 5 bits of input);
- Simple, efficient;
- Support in other languages (see Crockford 32 on Github).
- Add a wrapper type, which will allow for:
- Typed UUIDs instead of flavors of Ints only;
- Compile UUID definitions to minimal set of bit-shifts for calculations and (de-)encoding
This library is Open Source software released under the LGPL 3.0+ license.
Enjoy! 😃