Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User guide documentation update #5

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

davidhicks
Copy link
Contributor

No description provided.

Add an example showing how to use hexadecimal notation when defining
enumerations (enums).
Add an example to show how enumeration values can be referred to in
switch-on/cases constructs by identifier instead of integer value.
This is a first attempt at describing how streams and substreams work
in Kaitai Struct. This documentation is based on advice provided at:
kaitai-io/kaitai_struct#145 (comment)
@GreyCat
Copy link
Member

GreyCat commented Apr 11, 2017

Oh, bummer ;) I've just realized that I wrote streams/substreams section as well on April, 9th, and just forgot to push it into public...

@GreyCat
Copy link
Member

GreyCat commented Apr 11, 2017

I'm so sorry... I'll try to think of a way to merge both these sections into one.

@davidhicks
Copy link
Contributor Author

Two sets of documentation is better than zero! :) I'll also see about updating this branch to ensure it can be merged. Is the content of these proposed documentation updates accurate? Feedback appreciated on any inaccuracies or suggestions for rewording.

thus an exception occurs. The fact that the root stream still has
1001 bytes available to be requested from the input file does not
matter, as the `body` substream never has the opportunity to request
any more than the first 1000 bytes of the input file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not a pitfall, but a legitimate behavior, and well-explained in previous section.

The "pitfall" I was thinking about in this section is the following: when a new substream is created, all parse instances with positions act within that substream by default.

So, this one works as expected:

seq:
  - id: skipped
    size: 1000
  - id: indexing
    type: file_index_entry
    # but adding "size: 24" here will ruin "file_body" instance,
    # although it looks legitimate at the first glance
types:
  file_index_entry:
    seq:
      - id: file_name
        type: str
        size: 16
      - id: file_pos
        type: u4
      - id: file_len
        type: u4
    instances:
      file_body:
        pos: file_pos
        size: file_len

To overcome that, one needs to use something like io: _root._io in file_body. Of course, documentation warrants a somewhat better example and explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. I didn't know about io: either, so that's a good one to document! Nice feature!

@@ -380,6 +380,21 @@ enums:
17: udp
----

Alternatively, hexadecimal notation can also be used to define an enumeration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally ok, but I'd also noted that this is a service provided by YAML, not something specific to KS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that a new section of the document could be created for general syntax and a very brief overview of YAML and what it provides. This example I provided may be better suited there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Construct features are Python features, but I would advertise them just the same. Purpose of documentation is to show capabilities, not attribution. =) Just saying.

@@ -832,6 +832,39 @@ other value which was not listed explicitly.
_: rec_type_unknown
----

If an enumeration has already been defined, you can use references to
items in the enumeration instead of specifying integers a second time:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you defined key as enum, then you don't have much choice. You can't compare enums to integers without additional conversions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point, I'll update the text accordingly

#...
data_field_depth:
seq:
#...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic person in me cries for that misaligned #... ;)
And, anyway, seq is totally optional, so may be it's better to wrap it up as:

types:
  data_field_width: # ...
  data_field_height: # ...
  data_field_depth: # ...

for brevity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, agreed

and cannot request data be provided out of sequential order. A stream
knows the maximum amount of data available to be requested by the
parser and the actual amount of data which has already been
requested by the parser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is pretty abstract and somewhat misleading. "Stream" can be re-read as many times as needed, and it can be seeked: that's exactly how positional parse instances work, they use seek operations on a stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think of another way to explain streams then, especially with reference to how pos: works (seeking) and how io: can be used to designate which stream to use.

stream. The root stream will know the maximum amount of data available
to be requested by the parser as the file size of the input file which
is being parsed. Initially, the root stream will know that 0 bits of
data have been requested by the parser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streams can be used on in-memory byte arrays too, not necessarily files (which have file sizes). And, actually, stream does not "know" full file size, but it can query it on demand. File size can change if file is modified when KS parsing is in progress, so it's actually ok to have _io.size to return varying values in different points in time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, probably one worth adding to the pitfalls section (or troubleshooting or similar) for the few people who may encounter the issue and not understand what is going on.

@arekbulski
Copy link
Member

This thread is stale, but could you resolve it before I work on any documentation?

@GreyCat
Copy link
Member

GreyCat commented Jan 18, 2018

Yeah, I just need to find some time to finish it...

@arekbulski
Copy link
Member

I will try to help you finish it, with the little understanding of how Kaitai works that I have.

@GreyCat
Copy link
Member

GreyCat commented Jan 19, 2018

That would be most awesome %)

@arekbulski
Copy link
Member

I will carefully review this PR and attempt to solve any existing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants