Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PHP serialized value and phar archive format #173

Merged
merged 7 commits into from
Jun 20, 2019

Conversation

dgelessus
Copy link
Contributor

This adds specs for the following two formats, both from the PHP ecosystem:

  • serialization/php_serialized_value: PHP's serialized data format, as used by the standard serialize and unserialize functions.
  • archive/phar_without_stub: The phar (PHP Archive) format, used to package a PHP application or library into a single, self-contained, and optionally executable archive.

I'm submitting these as one PR, because the phar format loosely depends on the PHP serialization format (some optional fields in the phar format contain serialized PHP data).

type: u4
doc: The unparsed flag bits.
instances:
permissions:
Copy link
Contributor

@KOLANICH KOLANICH Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a type of bit fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official documentation specifies them as integer bit masks (see here and here). Translating hex masks to bit fields is already not obvious because of bit order issues. In addition, here the flags are stored as a little-endian integer, so you also have to deal with the bytes being swapped (compared to the hex masks, which are naturally written in big-endian order).

The permissions field is a particular problem, because it spans two bytes. In combination with the little-endian byte swapping, this means that the permissions field would need to be split into two bit fields (the 8 least significant bits, then 7 unused bits, then the 1 most significant bit), so you would need to reconstruct it using a value instance anyway. kaitai-io/kaitai_struct#155 is probably relevant here.

This explanation might not be completely accurate, I might be misremembering some things. A while ago I tried implementing this using bit fields and couldn't get it to work after many tries, so I changed it to the integer and bit mask combination and it worked on the first try. Since the official format documentation and the two major implementations treat these fields as integers with bit masks, I think it makes sense for our spec to do the same, especially since the alternative with bit fields would be a pain to write and hard to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translating hex masks to bit fields is already not obvious because of bit order issues. In addition, here the flags are stored as a little-endian integer, so you also have to deal with the bytes being swapped (compared to the hex masks, which are naturally written in big-endian order).

I know. But the spec says

All values greater than 1 byte are stored in little-endian byte order, with the exception of the API version, which for historical reasons is stored as 3 nibbles in big-endian order.

This mean we can assume they are le and lay them out accordingly.

Since the official format documentation and the two major implementations treat these fields as integers with bit masks, I think it makes sense for our spec to do the same, especially since the alternative with bit fields would be a pain to write and hard to understand.

Most of specs do the same. Probably to make it easier to implement them for languages without bit fields. But KSC writes the code for us.

kaitai-io/kaitai_struct#155 is probably relevant here.

It is, in the end of that issue there is a link. If you want to make a helper tool for swapping bytes for such types, it should be useful. In fact it is already implemented, all you need is to rip the relevant parts out of my tool and add some CLI/GUI. Or maybe translate into Scala and implement the feature for KSC itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using bit fields here makes sense until Kaitai Struct allows switching to LSB-first bit order. Processing little-endian data in MSB-first bit order is too confusing IMHO, and as far as I can tell doesn't offer any big advantage over the current integer mask approach.

The "metadata.bin" extension is only used in one special case;
otherwise there is no standard or commonly used file extension for
serialized PHP values.
Except for one doc-ref URL, since there's no reasonable way to wrap it.
@GreyCat
Copy link
Member

GreyCat commented Jun 20, 2019

Looks good to me, merging in!

@GreyCat GreyCat merged commit 1278636 into kaitai-io:master Jun 20, 2019
@GreyCat
Copy link
Member

GreyCat commented Jun 20, 2019

Thanks, @dgelessus!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants