Skip to content

Latest commit

 

History

History
206 lines (162 loc) · 7.39 KB

README.md

File metadata and controls

206 lines (162 loc) · 7.39 KB

Image Tiles to SQLite

Aggregate an image tile set into a single SQLite database for easier and faster access.

demo

Querying several million of small files is not super efficient. This script creates a single SQLite database containing metadata and the image tiles as binary data to speed up file handling and tile querying.

This script can aggregate tile sets from the Gigapan Downloader out of the box and the created database can be imported into HiGlass Server to be viewed in HiGlass.

Installation

Prerequirements:

  • Python v3.6
git clone https://github.com/flekschas/image-tiles-to-sqlite && cd image-tiles-to-sqlite
mkvirtualenv -a $(pwd) -p python3 im2db  // Not necessary but recommended
pip install --upgrade -r ./requirements.txt

CLI

Image tiles to SQLite db

usage: im2db.py [-h] [-o OUTPUT] [-i INFO] [-t {jpg,png,gif}] [-v] dir

positional arguments:
  dir                   directory of image tiles to be converted

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of the sqlite database to be generated
  -i INFO, --info INFO  name of the tile set info file
  -t {jpg,png,gif}, --imtype {jpg,png,gif}
                        image tile data type
  -v, --verbose         increase output verbosity

Example:

./im2db.py test/54825
// -> 54825.imtiles

Tests:

This runs an end-to-end test on the test data (test/54825)

./run_test.sh

What's Going On?

Take a look at im2db.py; trust me, it's a short file. Under the hood the script creates a SQLite database holding following two tables:

  • tileset_info
  • tiles

tileset_info is an extension of clodius's metadata table and holds the following columns:

  • zoom_step [INT]: not used
  • max_length [INT]: not used
  • assembly [TEXT]: not used
  • chrom_names [TEXT]: not used
  • chrom_sizes [TEXT]: not used
  • tile_size [INT]: Size in pixel of the tiles
  • max_zoom [INT]: Max. zoom level.
  • max_size [INT]: Max. width, i.e., tile_size * 2^max_zoom.
  • width [INT]: Width of the image
  • height [INT]: Height of the image
  • dtype [TEXT]: Data type of the images. Either jpg, png, or gif.

tiles is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z, y, and x.

  • z [INT]: Z position of the tile.
  • y [INT]: Y position of the tile.
  • x [INT]: X position of the tile.
  • image [BLOB]: The binary image data of a tile.

Display in HiGlass

./manage.py ingest_tileset \
  --filename imtiles/<IMTILES-NAME>.imtiles \
  --filetype imtiles \
  --datatype <jpg,png,gif> \
  --coordSystem pixel \
  --coordSystem2 pixel \
  --uid <IMTILES-NAME> \
  --name '<IMTILES-NAME>' \
  --no-upload

Gigapan snapshots to BEDPE SQLite database

usage: snapshots2db.py [-h] [-o OUTPUT] [-i INFO] [-m MAX] [-p]
                       [--pre-fetch-file PRE_FETCH_FILE]
                       [--pre-fetch-zoom-from PRE_FETCH_ZOOM_FROM]
                       [--pre-fetch-zoom-to PRE_FETCH_ZOOM_TO]
                       [--pre_fetch_max_size PRE_FETCH_MAX_SIZE]
                       [--from-x FROM_X] [--to-x TO_X] [--from-y FROM_Y]
                       [--to-y TO_Y] [--xlim-rel] [--ylim-rel] [--limit-excl]
                       [-w] [-v]
                       file

positional arguments:
  file                  snapshots file to be converted

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of the sqlite database to be generated
  -i INFO, --info INFO  name of the tile set info file
  -m MAX, --max MAX     maximum number of annotations per tile
  -p, --pre-fetch       preload and store an image pyramind for every
                        annotation
  --pre-fetch-file PRE_FETCH_FILE
                        imtiles files to preload the annotations from
  --pre-fetch-zoom-from PRE_FETCH_ZOOM_FROM
                        initial zoom of for preloading (farthest zoomed out)
  --pre-fetch-zoom-to PRE_FETCH_ZOOM_TO
                        final zoom of for preloading (farthest zoomed in)
  --pre_fetch_max_size PRE_FETCH_MAX_SIZE
                        max size (in pixel) for preloading a snapshot
  --from-x FROM_X       only include tiles which end-x is greater than this
                        value
  --to-x TO_X           only include tiles which start-x is smaller than this
                        value
  --from-y FROM_Y       only include tiles which end-y is greater than this
                        value
  --to-y TO_Y           only include tiles which start-y is smaller than this
                        value
  --xlim-rel            x limits, defined via `--from-x` etc., are in
                        percentage relative to the full size
  --ylim-rel            y limits, defined via `--from-y` etc., are in
                        percentage relative to the full size
  --limit-excl          if limits are defined via `--from-x` etc. elements
                        have to be fully inside them
  -w, --overwrite       overwrite output if exist
  -v, --verbose         increase output verbosity

What's Going On?

Take a look at snapshots2db.py. Under the hood the script creates a SQLite database holding following three tables:

  • tileset_info
  • tiles

tileset_info is an extension of clodius's metadata table and holds the following columns:

  • zoom_step [INT]: not used
  • max_length [INT]: not used
  • assembly [TEXT]: not used
  • chrom_names [TEXT]: not used
  • chrom_sizes [TEXT]: not used
  • tile_size [INT]: Size in pixel of the tiles
  • max_zoom [INT]: Max. zoom level.
  • max_size [INT]: Max. width, i.e., tile_size * 2^max_zoom.
  • width [INT]: Width of the image
  • height [INT]: Height of the image

intervals is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z, y, and x.

  • id [INT]: Primary key
  • zoomLevel [INT]: Zoom level
  • importance [REAL]: Number of views
  • fromX [INT]: Start x position
  • toX [INT]: End x position
  • fromY [INT]: Start y position
  • toY [INT]: End y position
  • chrOffset [INT]: not used
  • uid [TEXT]: Random uuid
  • fields [TEXT]: Other fields; currently holding the snapshot description

position_index is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z, y, and x.

  • id [INT]: Primary key
  • rFromX [INT]: Start x position
  • rToX [INT]: End x position
  • rFromY [INT]: Start y position
  • rToY [INT]: End y position

Display in HiGlass

./manage.py ingest_tileset \
  --filename imtiles/<IMTILES-NAME>.snapshots.db \
  --filetype 2dannodb \
  --datatype 2d-rectangle-domains \
  --coordSystem pixel \
  --coordSystem2 pixel \
  --uid <IMTILES-NAME>-snapshots \
  --name '<IMTILES-NAME> Snapshots' \
  --no-upload