Common Index File Format CIFF is an inverted index exchange format as defined as part of the Open-Source IR Replicability Challenge (OSIRRC) initiative. The primary idea is to allow indexes to be dumped from Lucene via Anserini which can then be ingested by other search engines. This repository contains the necessary code to read the CIFF into a format which PISA can use for building (and then searching) indexes.
We currently provide a Rust binary for converting CIFF data to a PISA canonical index, and for converting a PISA canonical index back to CIFF. This means PISA can generate indexes that can then be consumed by other systems that support CIFF (and vice versa).
The package is available in Arch User Repository. If you are on an Arch-based system, you can install it by running the following:
# Replace yay with the helper of your choice.
yay -S ciff-pisa
Note that the installation methods described below are not system-wide. For example, on Linux the tools usually end up in
$HOME/.cargo/bin
directory. To use tools from command line, make sure to use the absolute path or update yourPATH
variable to include the$HOME/.cargo/bin
directory.
The library and the tools are also available in crates.io, so you can install the binaries in your local repository by running:
cargo install ciff
Just run cargo build --release
to build the binaries.
To convert a CIFF blob to a PISA canonical:
./target/release/ciff2pisa
To convert a PISA canonical to a CIFF blob:
./target/release/pisa2ciff
You can also install the binaries to your local cargo
repository:
cargo install --path .
or if you are installing the same version again:
cargo install --path . --force
If you are insterested in using the library components in your own Rust library, you can simply defeine it as a dependency in your Cargo.toml
file:
[dependencies]
ciff = "0.1"
The API documentation is available on docs.rs.