Skip to content

Proposal for very large resolution images

Dirk Farin edited this page Sep 19, 2024 · 22 revisions

The HEIF grid image item is limited to images with less than 256x256 tiles because the number of tiles per row/column is stored in an 8 bit integer and also because the number of references in iref is limited to 65535. Moreover, it has significant overhead because each tile image has a copy of the metadata iinf, ipma, iref, iloc that sum to >3.3 MB for a 256x255 tile image. This metadata is significant because it has to be loaded completely before decoding the image can start.

To support larger images, this proposal introduces a new image item_type = 'tili', as an alternative to grid.

Use Cases

  • Store 2D images in a tiled memory layout, supporting random access to any chosen tile via byte range access.
  • Support 2D tiling in 3D data structures, such as 2D single wavelength tiles in multicomponent 2D images, such as hyperspectral images.
  • Support 2D tiling in 4D data structures, where the four dimensions are occupied by 2D images, multiple color (wavelength) components as the third dimension, and time as the 4 th dimension. An example is multi or hyperspectral video or image sequences.
  • Note: before we list others, a literature search on MPEG (OMAF, etc.) and other 3D/4D tiling schemes (Cesium 3D tiles, etc.) is warranted to ensure there isn’t unwanted duplication.

Requirements

The following requirements are defined for the tiled media content using the tili syntax:

  • support for arbitrarily large resolutions (over 1M x 1M pixels in a single image)
  • much less overhead than grid,
  • enable streaming the image content over the internet with small initial setup delays,
  • support tiled images in which some tiles are blank and not covered with image data,
  • saving tiles in arbitrary order to allow gradually growing files,
  • interleaved storage of multiple tiled images, e.g. for multi-resolution pyramids where storage of the lower resolution layer is interleaved with the higher-resolution layers,
  • ability to build multi-resolution pyramids with a mixture of grid, tili, and unci images to have partial compatibility to software without tili support.

tili Image Item

An image item of type tili is an image stored as independently compressed image tiles. The compressed data of all tiles is concatenated and a table of offset pointers to the start of the individual tiles is stored in front of the compressed data. This allows to load both the image tiles and also the pointers to the tile on demand. A tili image item also enables storing volumetric or higher dimensional data (e.g. hyper-spectral images or time series) as a set of 2D image tiles.

Definition

  • Box type: 'tilC'
  • Container: ItemPropertyContainerBox
  • Property type: Descriptive item property
  • Mandatory (per item): Yes, for an image item of type 'tili'

The TiledImageConfigurationBox specifies the tile resolution and the compression codec used to store the image tiles in an image of type tili. For N-dimensional (N>2) images, it also specifies the resolution of these extra dimensions.

Syntax

aligned(8) class TiledImageConfigurationBox
extends ItemFullProperty('tilC', version=0, flags) {

  unsigned int(32) tile_width;
  unsigned int(32) tile_height;

  unsigned int(32) tile_compression_type;

  unsigned int(8) number_of_extra_dimensions;
  for (int i=0; i<number_of_extra_dimensions; i++) {
    unsigned int(32) dimension_size[i];
  }
}

Semantics

  • tile_width, tile_height is the size of a single tile. All tiles have the same size. Tiles at the right or bottom border may extend beyond the total image size.
  • tile_compression_type specifies the compression codec used for all the individual tile images. tile_compression_type is one of the possible four-character types of ordinary image items (e.g. hvc1 for h265 compression or j2k1 for JPEG2000).
  • number_of_extra_dimensions specifies the number of dimensions of the N-dimensional image as number_of_extra_dimensions = N - 2. A 2D image has number_of_extra_dimensions=0.
  • dimension_size[i] specifies the size of dimension i+2 of the N-dimensional image. The size of the first two dimensions is obtained from the mandatory ispe item property.
  • OffsetFieldLength = OFFS_LEN[flags & 0x03] defines the number of bits used to store the offset to the image data of a specific tile. OFFS_LEN[] = [ 32, 40, 48, 64 ]
  • SizeFieldLength = SIZE_LEN[(flags>>2) & 0x03] defines the number of bits used to store the length of the image data of a specific tile. SIZE_LEN[] = [ 0, 24, 32, 64 ]
  • (flags & 0x10) is a hint to a decoder whether the compressed tile image data is stored consecutively in sequential order.

tili Item Data

The item data consists of an offset pointer table TiledImageOffsetTable, followed by the compressed image data.

The number of tile offsets stored in the table (NumTiles) is computed by

  TileColumns = (ispe_width + tile_width -1)/tile_width;
  TileRows    = (ispe_height + tile_height -1)/tile_height;

  NumTiles = TileColumns * TileRows;
  for (i=0; i<number_of_extra_dimensions; i++) {
    NumTiles = NumTiles * dimension_size[i];
  }

ispe_width and ispe_height is the total image size as specified in the mandatory ispe item property.

aligned(8) class TiledImageOffsetTable {

  for (int i=0; i < NumTiles ; i++) {
    unsigned int(OffsetFieldLength) tile_start_offset[i];
    unsigned int(SizeFieldLength) tile_size[i];         // note: not present if SizeFieldLength==0
  }
}
// ... followed by compressed tile data ...
  • tile_start_offset[i] points to the start of the compressed data of the tile. The position is given relative to the start of the TiledImageOffsetTable data. If a tile is not coded and the corresponding image area is undefined, the tile_start_offset[i] shall be 0. If a tile is not coded, but the displayed image should be taken from a lower-resolution layer (in a pymd stack), tile_start_offset[i] shall be 1. (Note: this can be used for maps where large areas contain not much detail, like water areas.) Note that this is not a file offset, but an offset into the item's data that can potentially span several iloc extents.
  • tile_size[i] (if present) indicates the number of bytes of the coded tile bitstream.

The entries in the offset table are ordered in row-major sequence. I.e. for a 2D image, they are indexed as [y][x], a three dimensional volumetric image (extra dimension z) would be indexed as [z][y][x].

The compressed tiles data may be stored in the file in arbitrary order, i.e. the tile_start_offset[]s are not necessarily in increasing order.

If the tile_size[i] variables are not present, the decoder has to infer them from the tile_start_offset[]s. For the case that the tiles are stored in sequential order (flags & 0x10 == 0x10), the tile_size[i] can be computed as tile_start_offset[i+1] - tile_start_offset[i] with the exception of the last tile, which extends until the end of the data. If the tiles are not stored in sequential order, the decoder first has to sort the tile start offsets before it can again compute the size from the difference to the next tile start. Note that in this case, the decoder cannot read the offset table on-demand. Thus, we advise to store the tile sizes in this case if on-demand access of the tile offsets is desired. It is allowed that multiple tiles use the same tile_start_offset to reference a similar image content. This case has to be taken care of when computing the tile sizes.

Decoding Process of a Single Tile

The tili item shall have associated properties that are implicitly assigned to each tile. E.g. a tili image with tile_compression_type=hvc1 shall have an associated hvcC box that describes the coded stream of each tile.

The ispe item associated with the tili defines the total of the tili image, not the size of a tile. If this total image size is not an integer multiple of the tile size, the image data of the tiles at the right and bottom border is cropped to the total image size.

Decoding of a single tile shall be done equivalently to the following steps:

  • create a virtual image item of type tile_compression_type
  • assign an ispe item property of size (tile_width, tile_height) to the virtual image item
  • assign the mandatory item properties for an image item of type tile_compression_type from the tili item to the virtual image item
  • decode the virtual image item

Notes

  • Even though the compressed tile data logically follows continuously after the metadata, we can still write the data interleaved into the file (e.g. intermixed with other tili resolution layers) by employing iloc extents.

  • The four different offset pointer sizes correspond to these maximum tili image file sizes:

    pointer length maximum compressed image size
    32 bit 4 GB
    40 bit 1 TB
    48 bit 256 TB
    64 bit 16 EB
  • The four different tile size field lengths correspond to these maximum compressed tile sizes:

    size field length maximum tile size
    0 depending on pointer length
    24 bit 16 MB
    32 bit 4 GB
    64 bit 16 EB

Example of a 3D volume time series (2 extra dimensions 'z' and 't'): multidim

Tiles are indexed by [t][z][y][x]. The number in each tile denotes the sequence order in the offset table. Example: tile [1][1][1][0] is stored as entry 14.

File structure

Simple file with single tili image

file1

File with two interleaved tili images

file2

tili, grid, and unci coexistence

When building a multi-resolution pymd pyramid, different image types can be used for each layer. For example, it would be possible to use grid images for the lower resolution layers so that these can be read with software that does not understand tili image types. Software support for tili is only needed for the high-resolution layers.

pyramid