Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ble/complete/cand/yield doesn't appear to document it's output variables #351

Open
bkerin opened this issue Aug 5, 2023 · 11 comments
Open

Comments

@bkerin
Copy link
Contributor

bkerin commented Aug 5, 2023

GNU bash, version 5.2.15(1)-release (x86_64-pc-linux-gnu) [Linux Mint 20.3]ble.sh, version 0.4.0-devel4+0906fd9 (noarch) [git 2.25.1, GNU Make 4.3, GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)]
bash-completion, version 2.10 (hash:b6da7f7edfc1c3c5196c86c81fb05fce96f83c03, 74550 bytes) (noarch)
locale: LANG=en_US.UTF-8
terminal: TERM=xterm-256color wcwidth=12.1-west/15.0-2+ri, vte:6003 (65;6003;1)

function ble/complete/cand/yield doesn't appear to document it's output variables, in particular cand_cand, cand_word, and cand_pack.

Btw, is there some way to generated interface documentation from the input/output declarations that appear in the comments before functions?

@akinomyoga
Copy link
Owner

function ble/complete/cand/yield doesn't appear to document it's output variables, in particular cand_cand, cand_word, and cand_pack.

Yes, but it's not only ble/complete/cand/yield. These lists are not complete for many functions. The idea of the lists there are roughly the variables that the caller needs to care about, i.e., the variables that the caller needs to prepare before calling the function or the variables that the caller can reference after calling the function. Actually, cand_cand, cand_word, and cand_pack are internal variables that the completion framework of ble.sh uses but not the variables that the completion functions are supposed to reference.

But this rule is not strict, and I think some of the functions list the variables that are not supposed to be referenced by the callers. If you would like to make the lists complete, I could accept a PR, but I require the PR to perform it all at once for all the functions in ble.sh. I wouldn't merge a PR for halfway refactoring.

Btw, is there some way to generated interface documentation from the input/output declarations that appear in the comments before functions?

Currently, no. I wanted to write it someday, but I'm not urged to do it because the documentation is not intended for users but for developers, and developers can open the script file to look around it. Anyway, I guess it is not too difficult to write a converter or apply an existing tool after some adjustments.

@bkerin
Copy link
Contributor Author

bkerin commented Aug 30, 2023

The documentation that exists was useful for sure. I could help with this but I'm not exactly sure what is desired? Are you looking for a different PR, or fixes for the individual functions? I think most of the time I won't know easily which variables are not supposed to be referenced by callers ( I guess these are variables shared only by functions in the module, but not intended for use by outside clients?).

@akinomyoga
Copy link
Owner

I could help with this but I'm not exactly sure what is desired?

If you start to ask the above kind of questions (of which variables should be included for each function, etc.), probably nothing is desired. It is only welcome when you can make a high-quality PR, which requires a small amount of effort from me to review and request changes. Otherwise, I need to answer many questions, and I also need to review carefully and require many changes. It requires even more effort than I directly perform it by myself. The reason that I don't do it soon is that I don't have time to do everything, and the priority for this is lower than other tasks. I don't think it's worth taking time for now.

Are you looking for a different PR, or fixes for the individual functions?

As I have written, a PR performing it all at once.

@bkerin
Copy link
Contributor Author

bkerin commented Sep 11, 2023

I don't want to waste your time but I don't waste mine either so I want to understand for sure if what I could do is desired. Do you want:

A. A list of all globals assigned or read by each function that aren't mentioned in the header comments

or

B. Something like B but with additional filtering or info added to reflect the relevance of the variable to clients of the interface or something

I could do A if it would be useful to you, but B I can't as I don't understand the context well enough.

@akinomyoga
Copy link
Owner

akinomyoga commented Sep 11, 2023

Technically, something like A is fine, but before preparing the list, we'd like to list up things that should be cared about. There are already a few that currently pop up in my mind:

  • The global variables (i.e., the variables defined at the top level of shell scopes) should be excluded by default. Only the previous-scope variables (i.e., the variables that are declared to be local somewhere in the call tree and accessed from a child call) should be included. The global variables can be tested by their name being _* or bleopt_* (excluding _ble_local_* and __ble_*).
  • The previous-scope variables that are referenced or modified in the child functions should also be included. For example, when function A calls another function B, function B modifies variable x, and the change to variable x is visible from the caller of function A, function A should also list variable x as a variable that is modified by function A.
  • The variables that are trivial or expected (by the form of the function names) should be excluded. For example, the functions suffixed by .draw affect DRAW_BUFF (as described in Naming convention for functions), but that is trivial considering the role of the function and should not request to list all of such trivial variables. Another example is the completion source of the name ble/complete/source:*, which is expected to modify cand_count, etc. Other typical ones are ble/complete/action:*/*, ble/complete/menu-style:*/*, ble/prompt/backslash:*, etc. The variables by the pattern of the function name should be described under the description of the family of the function instead of repeating everything in individual function descriptions.

It is probably better to consider the automatic generation of such a list, i.e., parsing the function and listing up the variable assignments, declarations, and other ways of modifying and accessing variables (read, printf -v, unset -v, test -v). One way is to reuse the Bash source code and write a C function to perform that.

Another thing is that we could add the information of whether the function can output anything to the standard streams or not, which is specified by @stdout, etc.

@bkerin
Copy link
Contributor Author

bkerin commented Sep 14, 2023

Technically, something like A is fine, but before preparing the list, we'd like to list up things that should be cared about. There are already a few that currently pop up in my mind:

  • The global variables (i.e., the variables defined at the top level of shell scopes) should be excluded by default. Only the previous-scope variables (i.e., the variables that are declared to be local somewhere in the call tree and accessed from a child call) should be included. The global variables can be tested by their name being _* or bleopt_* (excluding _ble_local_* and __ble_*).

Ok (though as newcomer to the code I might like to see these mentioned by the tool if not in the header comments, though of course this could be an option).

  • The previous-scope variables that are referenced or modified in the child functions should also be included. For example, when function A calls another function B, function B modifies variable x, and the change to variable x is visible from the caller of function A, function A should also list variable x as a variable that is modified by function A.

Yes and this really need automation.

  • The variables that are trivial or expected (by the form of the function names) should be excluded. For example, the functions suffixed by .draw affect DRAW_BUFF (as described in Naming convention for functions), but that is trivial considering the role of the function and should not request to list all of such trivial variables. Another example is the completion source of the name ble/complete/source:*, which is expected to modify cand_count, etc. Other typical ones are ble/complete/action:*/*, ble/complete/menu-style:*/*, ble/prompt/backslash:*, etc. The variables by the pattern of the function name should be described under the description of the family of the function instead of repeating everything in individual function descriptions.

It is probably better to consider the automatic generation of such a list, i.e., parsing the function and listing up the variable assignments, declarations, and other ways of modifying and accessing variables (read, printf -v, unset -v, test -v). One way is to reuse the Bash source code and write a C function to perform that.

Yes I was thinking automation though not using full parse. But maybe when all the mechanisms you mention here are considered that would end up being the way to go. I'll try to take a shot at this sometime soon.

Another thing is that we could add the information of whether the function can output anything to the standard streams or not, which is specified by @stdout, etc.

That would be nice but I guess there are even more things that you'd need to be aware of. It's painful when tools like this end up lying to you so I might save this for v2 :)

@akinomyoga
Copy link
Owner

One way is to reuse the Bash source code and write a C function to perform that.

Yes I was thinking automation though not using full parse. But maybe when all the mechanisms you mention here are considered that would end up being the way to go. I'll try to take a shot at this sometime soon.

I believe it would be helpful to look at the implementation of named_function_string (print_cmd.c in the Bash source).

https://git.savannah.gnu.org/cgit/bash.git/tree/print_cmd.c?h=devel#n1347

@bkerin
Copy link
Contributor Author

bkerin commented Sep 18, 2023

One way is to reuse the Bash source code and write a C function to perform that.

Yes I was thinking automation though not using full parse. But maybe when all the mechanisms you mention here are considered that would end up being the way to go. I'll try to take a shot at this sometime soon.

I believe it would be helpful to look at the implementation of named_function_string (print_cmd.c in the Bash source).

https://git.savannah.gnu.org/cgit/bash.git/tree/print_cmd.c?h=devel#n1347

I took a look at this function and ran it in a loadable. I have to admit I'm a little unclear about how it could best be used. It appears to regurgitate the function definition. It shows some special handling for redirects which I guess could be useful for determining if output streams are affected, and optionally removes quoted escapes. I guess it could be used together with a starting point to recursively retrieve function definitions and possibly simplify parsing of entire source files, but since the header comments in those files need to be inspected anyway I'm not sure I'd use it for that. Could you clarify briefly what you think would make this most useful?

@akinomyoga
Copy link
Owner

akinomyoga commented Sep 18, 2023

Could you clarify briefly what you think would make this most useful?

I don't mean the function named_function_string can be called for the present purpose. We need to write a new C function that extracts the list of commands of a specified shell function anyway. In writing such a C function, I thought it would be useful to look at the implementation of named_function_string (but not using the function) since the function implementation illustrates all the ways to access Bash's internal representation for the entire function body.

@bkerin
Copy link
Contributor Author

bkerin commented Sep 19, 2023

Could you clarify briefly what you think would make this most useful?

I don't mean the function named_function_string can be called for the present purpose. We need to write a new C function that extracts the list of commands of a specified shell function anyway. In writing such a C function, I thought it would be useful to look at the implementation of named_function_string (but not using the function) since the function implementation illustrates all the ways to access Bash's internal representation for the entire function body.

Ok. I was just going to parse everything a line at a time with regexes but maybe there are too many cases where that would fail somehow. named_function_string() calls make_command_string_internal() which looks like it ultimately puts all variable assignments into a struct simple_com which also contains lots of things besides assignment stored in a simple WORD_LIST and REDIRECT (list). So there would still be some parsing to do to pick assignments back out, but the situation is probably safer since everything in the other cases in the switch in make_command_string_internal() is handled already.

So the overall plan could be:

  1. Write a function that inspects all the simple_com in each function's GROUP_COM and records assignments, and recursively on it's callees (at least the ones in ble.sh).
  2. Run it on some entry function(s) (which?).
  3. Compare with header comments in ble.sh source (so still parse files but only the header comments and associated function name).

Is this about what you have in mind?

@akinomyoga
Copy link
Owner

akinomyoga commented Sep 19, 2023

Is this about what you have in mind?

Yes, exactly.

2. Run it on some entry function(s) (which?).

I think there are many ways to achieve the final goal, but I was thinking that you can make it a loadable builtin and call the builtin on each function. The builtin can output the analysis result to stdout or store the result in shell variables. I think the recursive calls do not need to be processed at this stage; we can instead list the names of shell functions called from the specified function. Then we can perform the final resolution using any other language. But in another way, it is also possible to process everything in C.

edit: The reason that I thought we can first output the analysis results for each function (without the resolution of function calls) is that I anticipate that we need to adjust the analyzed data for functions doing something non-trivial. We can manually modify the output of the first stage and then feed it to the second resolution stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants