-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss multi-dim indexing / array order #7
base: main
Are you sure you want to change the base?
Conversation
* matrix convention and example * image convention
* change top level title * add conclusion * typo fixes
ArrayOrder.md
Outdated
or right) refers to rows vs columns for matrices in mathematics. | ||
|
||
* **Define:** Arrays storing matrices in "row-major" give columns stride 1. | ||
* **Define:** Arrays storing matrices in "column-major" give rows stride 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the use of rows and columns correct here? As far as I understand the concepts introduced here, "row-major" stores rows contiguous in memory, i.e., gives rows stride 1. I might be missing something, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is backwards. We also may need to put a unit on 1
.
In Julia, column-major, the strides are as follows, given in terms of the type (Float64
aka double
).
julia> strides(zeros(5,6,7))
(1, 5, 30)
In Python, the strides are given in terms of bytes.
In [5]: np.zeros((5,6,7)).strides
Out[5]: (336, 56, 8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"row is contiguous in memory" is equivalent to "column index has stride one" (not intuitive, i know)
"rows" and "columns" mean things for matrices only. but @mkitti , make a 2D example. Then you'll see strides (1,5)
, say. The first dimension indexes rows. Rows have stride 1 = column major.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean by this comment. I interpreted "gives columns stride 1" as "iterating within a column (i.e., changing the row index) has stride 1". In my view, my interpretation is consistent with your definition above:
the stride of a dimension is the (positive) step in the flat array that corresponds to the adjacent element along that dimension.
What about replacing "columns" and "rows" by "second" and "first" index, respectively?
ArrayOrder.md
Outdated
## Multi-dimensional array indexing | ||
|
||
Zarr stores multi-dimensional arrays into regularly sized chunks. | ||
Chunks are themselves multi-dimensional arrays of a smaller size than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...a smaller size than
not necessarily. you can have a zarr array with 1 chunk
ArrayOrder.md
Outdated
Zarr stores multi-dimensional arrays into regularly sized chunks. | ||
Chunks are themselves multi-dimensional arrays of a smaller size than | ||
the complete multidimensional array and are stored as a 1D array of | ||
values, called a "flattened" array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't most (all?) n-dimensional arrays stored as 1D arrays of values? talking about zarr implies that this is a zarr thing, rather than a storing-arrays-in-computers thing.
I'll revisit this next week - happy to zoom chat if it will be useful |
|
||
## Appendix | ||
|
||
### Programming languages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me that these ordering can be applied to "languages" as a whole. Most languages do not have multidimensional indexing or data structures at their base level. Rather it is often libraries built and used with those languages that implement multidimensional data structures and indexing.
For example, the C++ library Eigen defaults to column-major, F-order for storage:
https://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html
What is usually discussed with respect to languages is the storage order of the data rather than just the indexing order. Here things are murky since languages such as Java do not guarantee contiguous memory storage. Numpy supports storage in either F or C order.
My suggestion is to list libraries such as NumPy, Eigen, or imglib2 rather than languages here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just scrap the whole table, since I'm less and less sure what the added value is. The scope of this may now be such that anyone that this article is useful for already knows what's in that table and more details of the type you're pointing out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be useful for someone moving data from NumPy to imglib2 to see what the default order is of those two libraries.
array where `i` is the "first" index, and `k` is the "last" index. Here, we will consider only the non-negative integers as | ||
valid indexes for arrays, though different contexts may use a different index set. | ||
|
||
Multi-dimensional arrays are often stored as one-dimensional (1D), or "flat," arrays that are interpreted, or "reshaped," into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's actually obligatory to store arrays in a 1D representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By flat / 1D I mean that they're stored in contiguous memory, and that's not necessarily true of zarr arrays for example. While every chunk may be contiguous 1D, the "whole array" need not be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nD arrays could be stored as a Iliffe vector. That's not particularly smart, but it is a common when one does not have a supporting library.
|
||
Two-dimensional images are often stored as arrays where two dimensions vary the horizontal and vertical positions of the | ||
samples, and as a result these dimensions should be displayed horizontally and vertically, respectively. Most formats for | ||
storing "natural" images store data such that the "horizontal axis" / rows have a smaller stride than the "vertical axis" / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a natural image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tough to define precisely, but roughly "an image of physical objects that a human might see using the unaided eye" It's a pretty common term in computer vision
Used here in contrast to "medical" or "microscopic" images, which are not "natural" images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful here to clarify that "horizontal" and "vertical" are relative to the camera sensor, not the subject. Landscape and portrait images of the same subject should probably not be displayed based on the memory layout.
* rework stride and related definitions * clearer recommendations re memory layout * recommendations re dimension naming
This is a replica of #1 from @bogovicj. In order to keep things simple, we decided to keep contributions local, rather than forking this repository.
The two commits that exceed the scope of #1 only fix minor typos and inconsistencies.