Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style_guide ideas by KOLANICH #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KOLANICH
Copy link
Contributor

@KOLANICH KOLANICH commented Oct 18, 2017

@GreyCat
Copy link
Member

GreyCat commented Oct 20, 2017

Uhh, there's a lot of stuff at once, and it's actually hard to cherry-pick them, as they're a single monolithic commit. I'll try to comment in-line.

@@ -249,28 +248,36 @@ guesswork.

* For simple non-repeated fields, use a simple singular form —
i.e. `width`, `header`, `transaction_id`, `file`.
* For repeated files (i.e. with `repeat: something`), use plular form
* For a sequence (i.e. with `repeat: something`), use plular form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

* Repeated fileds which cannot be packed into a sequence should
have `id`s containing a number in the end. Numbers may have a
visually-obvious structure, like "the first digit is major, the second
one is minor".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pretty vague and can be interpreted in multiple ways. Probably an example would help. Right now, it's not clear (1) whether foo_1 or foo1 fit this, (2) how one starts the numbering and how it should progress, (3) what is "visually-obvious structure".

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 I use foo0
2,3 at KSY author's disposal. There are differrent situations and we want to be flexible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, yet again, the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.

If we wanna some checks, IMHO it's better to extend KS language instead of introducing a convention making the names ugly.

These numbers are not for checks, but for developer's convenience.

"magic values"), use `magic` name, or, if there are several of them,
`magic1`, `magic2`, etc.
"magic values"), use `signature` or `magic` name, or, if there are
several of them, like `signature` or `magic0`, `magic1`, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so, basically:

  1. You specify "signature or magic", and don't specify any way to choose between these two. Given that its essentially the same stuff, I see no point in separation this into two names, and adding headache for ksy developers to choose between two proposed alternatives.
  2. You want every numbered thing to start counting from 0. Any reasons to do so, i.e. any major examples of someone doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 at KSY author's disposal. I personally prefer "signature" because it clearly states that this field is used for identification.
2 Just taste. If we start arrays from index 0 in the most of languages why not to follow the same convention here?

particular, number of repetitions of some other structure), use either
`_count` suffix or `count_of_` (what sounds better in your opinion)
prefix and a plular form — i.e. `count_of_questions`, `blocks_count`,
`nodes_count`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think it's a good idea. The standard must be clear and avoid ambiguilities whenever it's possible. There is no point to make ksy developer to choose between several possible alternatives, especially using very subjective means (i.e. "sounds better"). The whole point of having a hard standard is to make it possible to use automated machinery to process stuff, and it's very hard to program a converter from num_* to *_count or count_of_*.

Also, both sound kinda quirky. Stuff like "blocks count", "nodes count" is plain wrong from English point of view. It should be "block count", "node count", etc, see this SE entry for examples. "Count of X" sounds like a title of some person for me (i.e. count of Champagne, count of Monte Cristo, count of Barcelona).

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point to make ksy developer to choose between several possible alternatives, especially using very subjective means (i.e. "sounds better").

No point, but I guess we want to have readable names.

The whole point of having a hard standard is to make it possible to use automated machinery to process stuff, and it's very hard to program a converter from num_* to count or count_of.

2 regexes with 1 capturing group each + one logical operation to recognize and parse such kind of names.

Also, both sound kinda quirky. Stuff like "blocks count", "nodes count" is plain wrong from English point of view. It should be "block count", "node count", etc, see this SE entry for examples.

You are right. But you can also think about this as about count property of blocks with a _ instead of ., since . is not allowed.

"Count of X" sounds like a title of some person for me (i.e. count of Champagne, count of Monte Cristo, count of Barcelona).

IMHO it's completely OK since there isn't many counts, dukes and baronets in variable names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're kind of contradicting yourself:

No point, but I guess we want to have readable names.
But you can also think about this as ...

Either you want "normal English" names, then you can't have "blocks count" and "count of blocks". If you allow some special naming, then "num_blocks" is about as readable as your versions. Hungarian apps notation recommends "nBlocks" for these purposes, and even that is understandable by majority of people.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I especially dislike num prefix because of its redundancy and ambiguity. It says that the field is number, but we see that it is a number from the type. It says nothing about the purpose of that number. cnt should be better here. And again, I'm against prefixes and unreadable tokens.

Hungarian notation

is evil.

entry), `len_blocks` (total length of whole `blocks` array, made of
(in bytes or some other fixed units), use `_size` suffix and name of
that data structure — i.e. `block_size` (length of a single `block`
entry), `blocks_size` (total length of whole `blocks` array, made of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it boils down to:

  1. Prefix vs suffix
  2. Long of short spelling

In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation. Prefix does not try to sound like "proper English", and it avoids lots of ambiguilities and spelling problems (like that "count" stuff I've demonstrated above). For example, there are tricky nouns in English (like "water" or "money" or "news" or "advice" or "hair"), which make proper English phrasing difficult.

Last, but not least, I definitely prefer shorter form of spelling. Given that we have to make "one size fits all or most" identifiers, we're not going to have full-length Java-style identifiers like "NormalBucketDistributionProgressionOfLongIntegerConstantsForHadoopNodeFactory" — (1) there are languages that have certain ID length limits, (2) a vast majority people I've interviewed do not like such long identifiers very much, (3) it's actually hard to work with them without a IDE with expression autocompletion and stuff — and we don't have one (and I'm not very keen about having IDE as the only way to work with the language).

So, all and all, clear 3 character prefix looks like a good trade-off for me — it's non-ambiguous, it makes it very easy to detect whatever this attribute is about by always looking at the beginning, and it fits concise C-like style well.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation.

It doesn't mean that we should do this too.

The good part in postfix is that since we read left-to-right all the properties of the same member look similar to each other. Especially when aligned. But with prefix they look too differrent. Also take in mind the argument with the point.

(1) there are languages that have certain ID length limits,

Valid.

(2) a vast majority people I've interviewed do not like such long identifiers very much

Neither do I. But In the case of short enough identifiers, which is the majority of them, I prefer to have them readable. It is only a guide, not a strict requirement.

(3) it's actually hard to work with them without a IDE with expression autocompletion and stuff

Valid. BTW some text editors, like Notepad++, have autocompletion which autocompletes any tokens.

`block` entries).
* Fields of unknown/undetermined purpose, i.e. unfinished reverse
engineering work SHOULD either NOT TO HAVE `id` or HAVE an
`id` matching the `/unkn(?:own)?(_\w+)?\d*/` regular expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, again, not how style guide should be written. It's pointless to leave so much choices to do for ksy developer. It's better to decide on this once and for all, and it will be easy for everyone — to remember, to follow, to understand, to write automated tools for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Having an id for a property of unknown meaning is useful when doing reverse-engineering, especially if you have more than one such a property in the same struct.
2 But if a developer doesn't need that id, why to force him to have it?

* In the case of multiple enums with the same name it's usually
better to append the integer value rather than a sequence
number to its `id`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some problems understanding what you've meant here. Probably an example should help.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. But we should describe it in more detail, with an example, and think of some other choices to be made here. For example, should it be unknown31 or unknown_31? unknown_31 or unknown_1f or unknown0x1f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, should it be unknown31 or unknown_31? unknown_31 or unknown_1f or unknown0x1f?

No 0x, 0o and 0b prefixes, no underscore, the number part is the same as in the left part.

@arekbulski
Copy link
Member

I will carefully review this PR and attempt to solve any existing issues.

Co-Authored-By: Arkadiusz Bulski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants