Skip to content

yjcyxky/biominer-indexd

Repository files navigation

BioMiner Indexd

BioMiner Indexd is a hash-based data indexing and tracking service providing globally unique identifiers.
Similar to [Indexd](https://github.com/uc-cdis/indexd), but with a more.

GitHub Workflow Status License Latest Release

NOTE: NOT READY FOR PRODUCTION YET.

Features

  • Manage & retrieve files: index each file by UUID (e.g. biominer.fudan-pgx/b14563ac-dbc1-49e8-b484-3dad89de1a54) and record all repository locations, file names, MD5 values, DOI numbers, repository links, version numbers, sizes, etc. of files

  • Track file location: provide a mechanism to register & track file location, for the same file released in multiple repositories (OSS, S3, GSA, NODE, SRA, ENA.)

  • Manage multi-version files: provide Base UUID indexing of different versions of files (i.e., get the Base UUID, you can query all the historical versions of a file in the system) for different versions of Pipeline analysis to generate multiple versions of Level2/3 files.

  • Track file status: whether the file is in the index, or has been deleted, or has been updated, or can be downloaded.

  • Bulk get download links: query specified files by UUID/MD5 and get download links of specified repositories. It is better to use with biopoem.

  • More features...

Quick Start

  • Get BioMiner Indexd (Download Latest Version)

  • Install PostgreSQL (Recommended version: 10.x)

  • Set Environment Variables

    export DATABASE_URL=postgres:://user:password@localhost:5432/biominer_indexd
    # NOTE: BIOMIER_REGISTRY_ID only allow to set one time. If you want to change it, you need to rebuild the database.
    export BIOMIER_REGISTRY_ID=fudan-pgx
  • Start BioMiner Indexd

    $ biominer-indexd --help
      Biominer Indexd 0.1.0
      Jingcheng Yang <[email protected]>
      An Index Engine for Omics Data Files
    
      USAGE:
          biominer-indexd [FLAGS] [OPTIONS]
    
      FLAGS:
          -D, --debug      Activate debug mode short and long flags (-D, --debug) will be deduced from the field's name
          -h, --help       Prints help information
          -V, --version    Prints version information
    
      OPTIONS:
          -d, --database-url <database-url>    Database url, such as postgres:://user:pass@host:port/dbname. You can also set
                                              it with env var: DATABASE_URL
          -H, --host <host>                    127.0.0.1 or 0.0.0.0 [default: 127.0.0.1]  [possible values: 127.0.0.1,
                                              0.0.0.0]
          -p, --port <port>                    Which port [default: 3000]

For Developers

  1. Install PostgreSQL Client
# Ubuntu
sudo apt-get install postgresql-client

# MacOS
brew install postgresql
  1. Install sqlx-cli
cargo install sqlx-cli
  1. Install docker
# Ubuntu
sudo apt-get install docker.io

# MacOS
brew install docker
  1. Test
# It will build a testing database with docker and run the `cargo test`.
make test
  1. Build & Run
export DATABASE_URL=postgres://postgres:password@localhost:5432/test_biominer_indexd 
cargo run -- --help

Build

  1. Build Frontend
# All frontend files will output to assets directory.
cd studio && yarn build:embed && cd ..
  1. Build Indexd
# For MacOSX
cargo build --release

# For Linux
cargo build --release --target=x86_64-unknown-linux-musl
  1. [Optional] For BioMiner Service

    cp target/x86_64-unknown-linux-musl/release/biominer-indexd ../biominer/docker/packages/

Contributing

Comming soon...

License

Copyright © 2022 Jingcheng Yang

Distributed under the terms of the GNU Affero General Public License v3.0.