Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss multi-dim indexing / array order #7

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

minnerbe
Copy link
Collaborator

This is a replica of #1 from @bogovicj. In order to keep things simple, we decided to keep contributions local, rather than forking this repository.

The two commits that exceed the scope of #1 only fix minor typos and inconsistencies.

ArrayOrder.md Outdated Show resolved Hide resolved
ArrayOrder.md Outdated
or right) refers to rows vs columns for matrices in mathematics.

* **Define:** Arrays storing matrices in "row-major" give columns stride 1.
* **Define:** Arrays storing matrices in "column-major" give rows stride 1.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of rows and columns correct here? As far as I understand the concepts introduced here, "row-major" stores rows contiguous in memory, i.e., gives rows stride 1. I might be missing something, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is backwards. We also may need to put a unit on 1.

In Julia, column-major, the strides are as follows, given in terms of the type (Float64 aka double).

julia> strides(zeros(5,6,7))
(1, 5, 30)

In Python, the strides are given in terms of bytes.

In [5]: np.zeros((5,6,7)).strides
Out[5]: (336, 56, 8)

Copy link
Collaborator

@bogovicj bogovicj Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"row is contiguous in memory" is equivalent to "column index has stride one" (not intuitive, i know)

"rows" and "columns" mean things for matrices only. but @mkitti , make a 2D example. Then you'll see strides (1,5), say. The first dimension indexes rows. Rows have stride 1 = column major.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean by this comment. I interpreted "gives columns stride 1" as "iterating within a column (i.e., changing the row index) has stride 1". In my view, my interpretation is consistent with your definition above:

the stride of a dimension is the (positive) step in the flat array that corresponds to the adjacent element along that dimension.

What about replacing "columns" and "rows" by "second" and "first" index, respectively?

ArrayOrder.md Outdated
## Multi-dimensional array indexing

Zarr stores multi-dimensional arrays into regularly sized chunks.
Chunks are themselves multi-dimensional arrays of a smaller size than
Copy link
Collaborator

@d-v-b d-v-b Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...a smaller size than

not necessarily. you can have a zarr array with 1 chunk

ArrayOrder.md Outdated
Zarr stores multi-dimensional arrays into regularly sized chunks.
Chunks are themselves multi-dimensional arrays of a smaller size than
the complete multidimensional array and are stored as a 1D array of
values, called a "flattened" array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't most (all?) n-dimensional arrays stored as 1D arrays of values? talking about zarr implies that this is a zarr thing, rather than a storing-arrays-in-computers thing.

ArrayOrder.md Outdated Show resolved Hide resolved
@minnerbe
Copy link
Collaborator Author

Just a quick check-in: where are we on this issue @bogovicj, @d-v-b? Is there a need for a more focused discussion, e.g., over zoom?

@bogovicj
Copy link
Collaborator

I'll revisit this next week - happy to zoom chat if it will be useful


## Appendix

### Programming languages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me that these ordering can be applied to "languages" as a whole. Most languages do not have multidimensional indexing or data structures at their base level. Rather it is often libraries built and used with those languages that implement multidimensional data structures and indexing.

For example, the C++ library Eigen defaults to column-major, F-order for storage:
https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html

What is usually discussed with respect to languages is the storage order of the data rather than just the indexing order. Here things are murky since languages such as Java do not guarantee contiguous memory storage. Numpy supports storage in either F or C order.

My suggestion is to list libraries such as NumPy, Eigen, or imglib2 rather than languages here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just scrap the whole table, since I'm less and less sure what the added value is. The scope of this may now be such that anyone that this article is useful for already knows what's in that table and more details of the type you're pointing out...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be useful for someone moving data from NumPy to imglib2 to see what the default order is of those two libraries.

array where `i` is the "first" index, and `k` is the "last" index. Here, we will consider only the non-negative integers as
valid indexes for arrays, though different contexts may use a different index set.

Multi-dimensional arrays are often stored as one-dimensional (1D), or "flat," arrays that are interpreted, or "reshaped," into
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually obligatory to store arrays in a 1D representation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By flat / 1D I mean that they're stored in contiguous memory, and that's not necessarily true of zarr arrays for example. While every chunk may be contiguous 1D, the "whole array" need not be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nD arrays could be stored as a Iliffe vector. That's not particularly smart, but it is a common when one does not have a supporting library.


Two-dimensional images are often stored as arrays where two dimensions vary the horizontal and vertical positions of the
samples, and as a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for
storing "natural" images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" /
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a natural image?

Copy link
Collaborator

@bogovicj bogovicj Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tough to define precisely, but roughly "an image of physical objects that a human might see using the unaided eye" It's a pretty common term in computer vision

Used here in contrast to "medical" or "microscopic" images, which are not "natural" images

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful here to clarify that "horizontal" and "vertical" are relative to the camera sensor, not the subject. Landscape and portrait images of the same subject should probably not be displayed based on the memory layout.

* rework stride and related definitions
* clearer recommendations re memory layout
* recommendations re dimension naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants