Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study changes to regen/regen*.pl #198

Closed
jkeenan opened this issue Aug 17, 2020 · 12 comments
Closed

Study changes to regen/regen*.pl #198

jkeenan opened this issue Aug 17, 2020 · 12 comments
Assignees
Labels
study Study changes made by developers since 5.32.0

Comments

@jkeenan
Copy link
Collaborator

jkeenan commented Aug 17, 2020

In this ticket we will study our progress in pursuit of Objective 2, "strict-by-default", for the following files in the Perl core distribution:

regen.pl
regen/charset_translations.pl
regen/ebcdic.pl
regen/embed.pl
regen/embed_lib.pl
regen/feature.pl
regen/genpacksizetables.pl
regen/keywords.pl
regen/lib_cleanup.pl
regen/mg_vtable.pl
regen/miniperlmain.pl
regen/mk_PL_charclass.pl
regen/mk_invlists.pl
regen/opcode.pl

These files should be evaluated in light of the guidelines presented on this wiki page. Please post the summary of your findings in a single comment in this issue.

First-time contributors: Please post a comment to this ticket indicating you are working on it. That will enable us to assign the ticket to you. (GitHub won't let us do that until it already thinks you're part of the project.)

Thank you for helping to improve the Perl programming language.

@jkeenan jkeenan added the study Study changes made by developers since 5.32.0 label Aug 17, 2020
@oodler577
Copy link
Contributor

oodler577 commented Aug 17, 2020

@jkeenan, his looks interesting. Thank you.

@oodler577
Copy link
Contributor

oodler577 commented Aug 20, 2020

Here are my notes, tried to focus on what was relevant to know wrt "strict by default"

regen.pl

* tied to "system perl" (/usr/bin/perl)
* Requires perl 5.4
* `use strict` enabled (warnings not enabled)
* iterates over scripts in the implicit __DATA__ section (I didn't realize this can be defined implicitly)

regen/charset_translations.pl

* `use strict`, `use warnings` enabled
* some subroutines use parameter prototypes
* $CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES is treated as a global _after_ strict/warnings 
* $CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES = 14
* %I8_TO_NATIVE_UTF8 is treated as a file scoped global, used by subroutines
* closure starts at line 173 to provide data encapsulation for $charset and $intent
* warnings turned off for 'qw' in side of the subroutine defined inside of another closure
* * "no warnings 'qw'" can be removed and the use of 'qw' fixed by wrapping it in parens 

regen/ebcdic.pl

* requires '.' is in @INC
* `use strict`, `use warnings` enabled
* pinned to 5.16.0
* require './regen/regen_lib.pl';
* require './regen/charset_translations.pl';
* no warnings 'once';
* * $CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES is treated as a global _after_ strict/warnings 

regen/embed.pl

* `use strict` enabled (warnings not enabled)
* pinned to perl 5.4

regen/embed_lib.pl

* perl -w affects $^W globally (no "use warnings")
* uses bare filehandles (not scoped lexically)
* subroutines present that rely on bare block scoping for processing opcodes
* relies on implicit variables (e.g., $_)
* uses perl's format feature

regen/feature.pl

* @INC updated in the BEGIN, requires './regen/regen_lib.pl' there also
* use's strict, but no warnings (not even invoking perl with -w)
* leverages current subroutine placeholder, __SUB__ in a globally scoped hash (%features)
* EOPM is a heredoc
* uses perl `format`
* generates feature.h
* defines some package scoped variables (using our)

regen/genpacksizetables.pl

* invokes perl with -w
* uses strict
* uses lexically scoped file handles

regen/keywords.pl

* uses strict
* invokes perl with -w
* generates keywords.h, keywords.c
* scopes variables using 'my'
* subs use variables scoped outside of their block

regen/lib_cleanup.pl

* shebang relies on PATH to locate perl
* perl invoked with -w
* uses strict
* scopes 2 variables with "our"
* other variables are scoped with "my"
* utilizes lexically scoped file handles
* uses block lables (e.g., FILE)

regen/mg_vtable.pl

* pinned to 5.04
* requires ./regen/regen_lib.pl in BEGIN
* variables scoped via "my"

regen/miniperlmain.pl

* /usr/bin/perl (no -w)
* uses strict
* lexically scoped filehandle

regen/mk_PL_charclass.pl

* perl relies on PATH
* perl invoked with -w
* pinned to perl 5.15
* uses strict
* uses warnings
* "require"s multiple .pl libraries at run time (not in BEGIN)

regen/mk_invlists.pl

* perl relies on PATH
* perl invoked with -w
* pinned to perl 5.15
* uses strict
* uses warnings
* "require"s multiple .pl libraries at run time (not in BEGIN)
* style nearly identical to regen/mk_PL_charclass.pl

regen/opcode.pl

* invokes /usr/bin/perl with -w
* requires ./regen/regen_lib.pl in BEGIN
* uses strict
* scopes variable swith "my"
* uses lexically scoped filehandles

@jkeenan
Copy link
Collaborator Author

jkeenan commented Aug 20, 2020

Here are my notes, tried to focus on what was relevant to know wrt "strict by default"

Wow! You really took the study part of the issue's subject seriously. This sets a very good example for other "code understanders" to come.

Two questions:

  1. In you remarks above, were you commenting on the versions of these files in Perl 5 blead or in the alpha-dev-02-strict branch in our perl-atoomic repository?
  2. Are there any code changes in these files in the alpha-dev-02-strict branch that you don't understand or that seem problematic to you?

Thank you very much.
Jim Keenan

@oodler577
Copy link
Contributor

I was not looking at the change branch, apparently - but I was on the alpha branch of @atoomic's remote:

git remote -v
origin  https://github.com/atoomic/perl.git (fetch)
origin  https://github.com/atoomic/perl.git (push)

I'll take a look at the changes in alpha-dev-02-strict and follow up. Thanks and sorry for any confusion.

@oodler577
Copy link
Contributor

oodler577 commented Aug 20, 2020

Here's the only diff in the files I studied between alpha and alpha-dev-02-strict:

diff --git a/regen/feature.pl b/regen/feature.pl
index ae9859812c..224167cf08 100755
--- a/regen/feature.pl
+++ b/regen/feature.pl
@@ -284,6 +284,10 @@ for (@HintedBundles) {
 }
 
 print $h <<'EOH';
+
+#define HINT_BUNDLE_V7  FEATURE_BUNDLE_527     << HINT_FEATURE_SHIFT
+#define HINT_BUNDLE_V5  FEATURE_BUNDLE_DEFAULT << HINT_FEATURE_SHIFT
+
 #define FEATURE_BUNDLE_CUSTOM  (HINT_FEATURE_MASK >> HINT_FEATURE_SHIFT)
 
 #define CURRENT_HINTS \

(no other files in this task are different)

@oodler577
Copy link
Contributor

oodler577 commented Aug 20, 2020

Are there any code changes in these files in the alpha-dev-02-strict branch that you don't understand or that seem problematic to you?

Not 100% sure why $CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES is not scoped with my or our in regen/charset_translations.pl's "strict" environment, but I think thats my only current unknown.

I had to look up what no warnings 'qw' did and what $^H represented, but that was straightforward.

@jkeenan
Copy link
Collaborator Author

jkeenan commented Aug 20, 2020

Are there any code changes in these files in the alpha-dev-02-strict branch that you don't understand or that seem problematic to you?

Not 100% sure why $CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES is not scoped with my or our in regen/charset_translations.pl's "strict" environment, but I think thats my only current unknown.

$CHARSET_TRANSLATIONS::UTF_EBCDIC_MAXBYTES is a fully-qualified variable, so we unambiguously know which package it exists in. In effect, it's an our variable in package CHARSET_TRANSLATIONS.

I had to look up what no warnings 'qw' did and what $^H represented, but that was straightforward.

@jkeenan
Copy link
Collaborator Author

jkeenan commented Aug 20, 2020

@oodler577, what your research has inadvertently called to my attention is an error in the algorithm I used to determine which files had changed between the time we started work on "perl 7" and the present. My formula incorrectly identified a number of files as having non-whitespace changes.

The formula I should have used to calculate the files needing study was this:

# current branch: alpha-dev-02-strict
$ git diff -w  --name-only alpha-00..HEAD -- regen/

English translation: Return only the names of those files in the regen/ subdirectory in the current working branch which have had non-leading-whitespace changes since tag alpha-00. That tag marks the point at which we forked Perl 5 blead and created the alpha branch in our own repository. It is exactly equivalent to the v5.32.0 tag in the Perl 5 repository and is 99.5% the same as production release perl-5.32.0.

When you run the command above, only 3 files are returned:

regen/feature.pl
regen/regen_lib.pl
regen/warnings.pl

Now, the fact that you studied more than just those three files is not wasted effort. It means that the number of people in the world has just grown by a large percentage! But what it does mean is:

  • I need to revise the list of files to be studied in all the other "study" tickets.
  • To complete work on this ticket, you only need to report any concerns that you still have on the three files listed above.

Thank you very much.
Jim Keenan

@jkeenan
Copy link
Collaborator Author

jkeenan commented Aug 20, 2020

@oodler577, what your research has inadvertently called to my attention is an error in the algorithm I used to determine which files had changed between the time we started work on "perl 7" and the present. My formula incorrectly identified a number of files as having non-whitespace changes.

The formula I should have used to calculate the files needing study was this:

# current branch: alpha-dev-02-strict
$ git diff -w  --name-only alpha-00..HEAD -- regen/

English translation: Return only the names of those files in the regen/ subdirectory in the current working branch which have had non-leading-whitespace changes since tag alpha-00. That tag marks the point at which we forked Perl 5 blead and created the alpha branch in our own repository. It is exactly equivalent to the v5.32.0 tag in the Perl 5 repository and is 99.5% the same as production release perl-5.32.0.

When you run the command above, only 3 files are returned:

regen/feature.pl
regen/regen_lib.pl
regen/warnings.pl

Now, the fact that you studied more than just those three files is not wasted effort. It means that the number of people in the world has just grown by a large percentage! But what it does mean is:

* I need to revise the list of files to be studied in all the _other_ "study" tickets.

* To complete work on this ticket, you only need to report any concerns that you still have on the three files listed above.

Thank you very much.
Jim Keenan

And on still more analysis, I realize that of the 3 files cited above, only 1 of them was originally included in this issue: regen/feature.pl. The other two were assigned to #199.

So if you think that regen/feature.pl is in good shape in the alpha-dev-02-strict branch, we can close this ticket.

I apologize for the confusion.

Thank you very much.
Jim Keenan

@oodler577
Copy link
Contributor

oodler577 commented Aug 20, 2020

This has been very fun and interesting so far. Thank you for the detailed explanation above.

update - I will look at regen/feature.pl quickly and opine

@oodler577
Copy link
Contributor

oodler577 commented Aug 20, 2020

@jkeenan - I looked over it once more. I also have a correction to my note, EOPM is a heredoc terminator. I called it a bareword filehandle. That was actually my main concern.

I also see format is being used. I looked around and don't believe there are any notable interactions with format and a strict environment. I don't know a whole lot about format but nothing gives me particular concern.

@jkeenan
Copy link
Collaborator Author

jkeenan commented Aug 20, 2020

@jkeenan - I looked over it once more. I also have a correction to my note, EOPM is a heredoc terminator. I called it a bareword filehandle. That was actually my main concern.

I also see format is being used. I looked around and don't believe there are any notable interactions with format and a strict environment. I don't know a whole lot about format but nothing gives me particular concern.

It turns out that formats can be problematic when we have a program running with strict on by default -- though I don't think this is the case with your files.

That's because a format is an entry in a typeglob, so it's global. Moreover, formats, like subroutines, can be declared anywhere in a file. So there are cases where (a) we have test files that have accumulated hundreds of different unit tests over 33 years; (b) we have a format definition at, say, line 800 of a file; but (c) we have a reference to the format entry in the corresponding typeglob at, say, line 200 of a file. If, during debugging, I comment out lines 500 to 1000 of that file, I'm commenting out the format definition and causing an exception at line 200. So seeing a format declaration in the middle of a big test file is a red flag (though refactoring such files is not the point of this project right now).

In any event, I'm satisfied with the work you've done on this ticket and am closing it now. At your earliest convenience, I'd encourage you to take on another ticket, perhaps one for the t/comp/ directory (#201 or #202) or in the 't/io/' directory (#203, #204 or #205).

Thank you very much.
Jim Keenan

@jkeenan jkeenan closed this as completed Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
study Study changes made by developers since 5.32.0
Projects
None yet
Development

No branches or pull requests

2 participants