-
Notifications
You must be signed in to change notification settings - Fork 7
/
CITATION.cff
73 lines (69 loc) · 2.98 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Validate with:
# python3 -m pip install --user cffconvert
# cffconvert --validate
cff-version: 1.2.0
title: Rapidgzip
message: >-
If you use this software, please cite it using the metadata from this file.
type: software
authors:
- given-names: Maximilian
family-names: Knespel
orcid: "https://orcid.org/0000-0001-9568-3075"
email: [email protected]
repository-code: "https://github.com/mxmlnkn/indexed_bzip2"
repository: "https://github.com/mxmlnkn/rapidgzip"
abstract: >-
A replacement for gzip for decompressing gzip files using
multiple threads.
keywords:
- Gzip
- Decompression
- Parallel Algorithm
- Performance
- Random Access
license: MIT
preferred-citation:
type: conference-paper
authors:
- given-names: Maximilian
family-names: Knespel
orcid: "https://orcid.org/0000-0001-9568-3075"
email: [email protected]
- family-names: "Brunst"
given-names: "Holger"
orcid: "https://orcid.org/0000-0003-2224-0630"
title: "Rapidgzip: Parallel Decompression and Seeking in Gzip Files Using Cache Prefetching"
year: 2023
isbn: 9798400701559
publisher:
name: "Association for Computing Machinery"
alias: "ACM"
city: "New York"
region: "NY"
country: "US"
url: "https://doi.org/10.1145/3588195.3592992"
doi: "10.1145/3588195.3592992"
abstract: >-
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9–126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.
collection-title: "Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing"
start: 295
end: 307
pages: 13
keywords:
- Gzip
- Decompression
- Parallel Algorithm
- Performance
- Random Access
conference:
name: "The 32nd International Symposium on High-Performance Parallel and Distributed Computing"
alias: "HPDC '23"
city: "Orlando"
date-start: 2023-06-20
date-end: 2023-06-23
region: "FL"
country: "US"
website: "https://www.hpdc.org/2023"
date-published: 2023-08-07
status: advance-online