Skip to content

Commit

Permalink
Improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dain committed Jun 26, 2024
1 parent c976409 commit 586d481
Showing 1 changed file with 98 additions and 14 deletions.
112 changes: 98 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,109 @@
# Compression in pure Java
# Compression for Java
[![Maven Central](https://img.shields.io/maven-central/v/io.airlift/aircompressor.svg?label=Maven%20Central)](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22io.airlift%22%20AND%20a%3A%22aircompressor%22)

This library contains implementations of
[Zstandard](https://www.zstd.net/) (Zstd),
[LZ4](https://www.lz4.org/),
[Snappy](https://google.github.io/snappy/), and
[LZO](https://www.oberhumer.com/opensource/lzo/) written in pure Java. They are
typically 10-40% faster than the JNI wrapper for the native libraries.
Additionally implementations of GZIP and Deflate using the Java built-in library,
and pure Java BZip2 implementations are provided for ease of integrations with
systems that need these algorithms.
This library provides a set of compression algorithms implemented in pure Java and
where possible native implementations. The Java implementations use `sun.misc.Unsafe`
to provide fast access to memory. The native implementations use `java.lang.foreign`
to interact directly with native libraries without the need for JNI.

# Hadoop CompressionCodec
# Usage

Each algorithm provides a simple block compression API using the `io.airlift.compress.Compressor`
and `io.airlift.compress.Decompressor` classes. Block compression is the simplest form of
which simply compresses a small block of data provided as a `byte[]`, or more generally a
`java.lang.foreign.MemorySegment`. Each algorithm may have one or more streaming format
which typically produces a sequence of block compressed chunks.

## byte array API
```java
byte[] data = ...

Compressor compressor = new Lz4JavaCompressor();
byte[] compressed = new byte[compressor.maxCompressedLength(data.length)];
int compressedSize = compressor.compress(data, 0, data.length, compressed, 0, compressed.length);

Decompressor decompressor = new Lz4JavaDecompressor();
byte[] uncompressed = new byte[data.length];
int uncompressedSize = decompressor.decompress(compressed, 0, compressedSize, uncompressed, 0, uncompressed.length);
```

## MemorySegment API
```java
Arena arena = ...
MemorySegment data = ...

Compressor compressor = new Lz4JavaCompressor();
MemorySegment compressed = arena.allocate(compressor.maxCompressedLength(toIntExact(data.byteSize())));
int compressedSize = compressor.compress(data, compressed);
compressed = compressed.asSlice(0, compressedSize);

Decompressor decompressor = new Lz4JavaDecompressor();
MemorySegment uncompressed = arena.allocate(data.byteSize());
int uncompressedSize = decompressor.decompress(compressed, uncompressed);
uncompressed = uncompressed.asSlice(0, uncompressedSize);
```

# Algorithms

## [Zstandard (Zstd)](https://www.zstd.net/) **(Recommended)**
Zstandard is the recommended algorithm for most compression. It provides
superior compression and performance at all levels compared to zlib. Zstandard is
an excellent choice for most use cases, especially storage and bandwidth constrained
network transfer.

The native implementation of Zstandard is provided by the `ZstdNativeCompressor` and
`ZstdNativeDecompressor` classes. The Java implementation is provided by the
`ZstdJavaCompressor` and `ZstdJavaDecompressor` classes.

The Zstandard streaming format is supported by the `ZstdInputStream` and `ZstdOutputStream`.

## [LZ4](https://www.lz4.org/)
LZ4 is an extremely fast compression algorithm that provides compression ratios comparable
to Snappy and LZO. LZ4 is an excellent choice for applications that require high-performance
compression and decompression.

The native implementation of LZ4 is provided by the `Lz4NativeCompressor` and `Lz4NativeDecompressor`
classes. The Java implementation is provided by the `Lz4JavaCompressor` and `Lz4JavaDecompressor`

## [Snappy](https://google.github.io/snappy/)
Snappy is not as fast as LZ4, but provides a guarantee on memory usage that makes it a good
choice for extremely resource-limited environments (e.g. embedded systems like a network
switch). If your application is not highly resource constrained, LZ4 is a better choice.

The native implementation of Snappy is provided by the `SnappyNativeCompressor` and `SnappyNativeDecompressor`
classes. The Java implementation is provided by the `SnappyJavaCompressor` and `SnappyJavaDecompressor`

The Snappy framed format is supported by the `SnappyFramedInputStream` and `SnappyFramedOutputStream`
classes.

## [LZO](https://www.oberhumer.com/opensource/lzo/)
LZO is only provided for compatibility with existing systems that use LZO. We recommend
reencoding LZO data using Zstandard or LZ4.

The Java implementation of LZO is provided by the `LzoJavaCompressor` and `LzoJavaDecompressor` classes.
Due to licensing issues, the LZO only has a Java implementation which is based on the LZ4 codebase.

## Deflate
Deflate is the block compression algorithm used by the `gzip` and `zlib` libraries. Deflate is
provided for compatibility with existing systems that use Deflate. We recommend reencoding
Deflate data using Zstandard which provides superior compression and performance.

The implementation of Deflate is provided by the `DeflateCompressor` and `DeflateDecompressor` classes.
This is implemented in the built-in Java libraries which internally use the native code.

# Hadoop Compression

In addition to the raw block encoders, there are implementations of the
Hadoop CompressionCodec (Streaming) for each algorithm. They are
typically 300% faster than the JNI wrappers.
Hadoop Streams for the above algorithms. In addition, implementations of
GZIP and BZip2 are provided so that all standard Hadoop algorithms are available.

The `HadoopStreams` class provides a factory for creating `InputStream` and `OutputStream`
implementations without the need for any Hadoop dependencies. For environments
that have Hadoop dependencies, each algorithm also provides a `CompressionCodec` class.

# Requirements

This library requires a Java 1.8+ virtual machine containing the `sun.misc.Unsafe` interface running on a little endian platform.
This library requires a Java 22+ virtual machine containing the `sun.misc.Unsafe` interface running on a little endian platform.

# Users

Expand Down

0 comments on commit 586d481

Please sign in to comment.