-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function argument fixes #198
base: staging
Are you sure you want to change the base?
Conversation
It's impossible for a node in a user-defined function to reference arguments beyond those which are defined, so don't store them in the arg lists. Evaluate them for side-effects and discard. This change also plugs a tempcell leak in args[i] for i >= ndef. ===============> cat leak.sh #/bin/sh # Monitor awk's memory consumption while repeatedly calling a # user-defined function with a couple of extra arguments. ${1:-./a.out} ' function f(x) { x } BEGIN { mem = "ps -p $PPID -o rsz=" system(mem) while (++i <= 100000) { f(i, i, i) if (i % 10000 == 0) system(mem) } } ' 2>/dev/null ===============> paste <(./leak.sh ./master.out) <(./leak.sh ./a.out) 2432 2304 3456 2304 4608 2304 5632 2304 6656 2304 7680 2304 8832 2304 9856 2304 11008 2304 12160 2304 13312 2304
When checking if a function call exceeds the NARGS maximum argument count, arguments for defined parameters were counted twice, reducing the call capacity in half. ===============> cat nargs.awk # NARGS is currently 50 function args25(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25) { print "25 arguments" } function args26(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25, p26) { print "26 arguments" } function args50(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25, p26, p27, p28, p29, p30, p31, p32, p33, p34, p35, p36, p37, p38, p39, p40, p41, p42, p43, p44, p45, p46, p47, p48, p49, p50) { print "50 arguments" } function args51(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25, p26, p27, p28, p29, p30, p31, p32, p33, p34, p35, p36, p37, p38, p39, p40, p41, p42, p43, p44, p45, p46, p47, p48, p49, p50, p51) { print "51 arguments" } BEGIN { args25(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25) args26(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26) args50(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50) # Call args50 with 51 arguments. The 51st does not correspond # to a defined parameter, so it doesn't count against NARGS # (since excess args are not stored in the argument lists). args50(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51) args51(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40, a41, a42, a43, a44, a45, a46, a47, a48, a49, a50, a51) } ===============>$ ./master.out -f nargs.awk 25 arguments ./master.out: function args26 has 52 arguments, limit 50 source line number 38 ===============>$ ./a.out -f nargs.awk 25 arguments 26 arguments 50 arguments ./a.out: function args50 called with 51 args, uses only 50 source line number 53 50 arguments ./a.out: function args51 has 51 defined arguments, limit 50 source line number 59
There are at least two reasons for why the propagation of the conversion of an uninitialized argument into an array cannot wait until the function has finished. The first is that otherwise any arguments in the active frame that share the origin of the converted argument will persist for the life of the function as independent copies of the uninitialized value instead of referring to the new array. The second is that the exit statement's longjmp entirely skips the function epilogue (where the conversion had been handled), opening the door to incorrect values in END blocks. Instead of waiting for the epilogue, toarray immediately initiates a scan of the stack frames to update those arguments that have the same origin as the converted argument. The argument list is now strictly of a single cell subtype, CCOPY. It had been allowed to contain a mix of subtypes (CCOPY and CVAR). This restriction simplifies the logic of the epilogue significantly. The oargs list is no more. CCOPY cells use cnext to record their origin (oargs recorded the parent, but the origin may be a more distant ancestor). Passing by reference had been done by setting the argument to the address of the array's cell. This is not allowed under the new regime. Instead it uses a pointer to the array's symbol table. Which means that the table must be immobilzed or the stack frames must be scanned upon relocation. Enter clearsymtab to empty arrays without moving their symtabs. This commit also fixes several other defects, among them a symbol table leak (in the epilogue, unaware of each other, undetected siblings would assign their private symbol tables to the same parent), the use-after-free discovered by @millert in onetrueawk#157 (apologies for the delay in getting this out), and the following use-after-free from a dangling name pointer (manifests only when debugging is enabled): ----->$ cat dangling-nval.sh awk=${1:-nawk} valgrind --leak-check=full $awk -d ' function f(x) { g() print x } function g() { delete a["hi"] } BEGIN { a["hi"] = "world" f(a["hi"]) }' ----->$ For the moment, function arguments are anonymous. This change forced a small modification to a test in T.misc.
run.c
Outdated
* | ||
* For now, args are anonymous. Debugging can use frame/arg indices. | ||
*/ | ||
y->nval = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a concern since nval
is printed by the debug and warning code. On Linux and BSD this is safe, since printf(3) will print "(null)" but on other systems it will cause a NULL dereference. So either we need to wrap nval
with the NN macro everywhere it is printed or set it to some non-NULL value. Setting to x->nval
(or a copy of it) is not really correct but perhaps is better than NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debugging statements need to be able to cope with anonymous cells. The temp cells used for expressions (including function return values) are anonymous. The jump cells are as well.
A quick grep found 13 instances of nval in DPRINTF statements, 10 wrapped, 3 unwrapped. I will wrap them all.
Also, parsing a function leaks the previous function definition's arglist. Compare 'function f(a) {}'
with 'function f(a) {} function g() {}'
. I will see about preserving the argument names while also eliminating the leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not just DPRINTF calls, also SYNTAX, WARNING and FATAL. Some of those may not be reachable but the ones in funnyvar() are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a lot to digest and review here. miguel, please keep nval NN wrapping as a separate PR. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plan9, I will keep the NN wrapping separate. I hope to add a commit to this branch later today (perhaps tomorrow) to provide meaningful names for the args.
Use the new struct Function in much the same way as struct Array (to make an object's internals reachable via Cell->sval). While recording argument names, plug the leak of arglist nodes. In the following program, f's arglist is leaked at the time of g's definition: function f(a,b,c) {} function g() {} The leak happens when varlist assigns to arglist in awkgram.y. Note that though the parsing and defining of a function are now limited only by available memory, invocations are still limited to NARGS (an int).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message:
Note that though the parsing and defining of a function are now
limited only by available memory
That's incorrect. yylval.i limits it to INT_MAX.
I recommend against using that last commit (b5e23fd). There are a few things about it that require tidying up. Also, what I had thought of as an unrelated branch of work on temp cells may intersect with this. I would like a little time to explore that. The anonymous arg cells should be okay for now. There are already anonymous cells (CTEMP mostly) floating around when compile_time == RUNNING. If printing a NULL nval causes a crash on a non-BSD/Linux system, it would have served to uncover a print statement that needs fixing. As @plan9 mentioned, there's already plenty to digest here. If we're in agreement, should I force push to my branch a reset to the previous HEAD or just leave things as they are? (I know how to use git, but github not so much). Take care, |
@mpinjr I think it is fine to go ahead with the anonymous arg cells. I can work on a PR to add missing NN wrapping if you'd like. |
@millert Thanks, but it's possible that it would be a waste of time. Here's what I have in mind... As you know, it's not safe to call execute if there exists an unhandled cell from an earlier execute. To avoid use-after-free (and other) bugs, whatever is needed from the first execute's cell must be obtained before the next execute begins. It occurred to me that if this discipline were applied consistently, we would only need one CTEMP cell. That cell could be statically allocated with access to it guarded by a boolean. An attempt to acquire it while busy would be a fatal error. (Like the function calling work, this is one of the ideas I was exploring last year.) So, if function arguments were named and if there were only a single tempcell, the implementation would be left with just 9 anonymous cells (defined near the top of run.c). Those could be named and then wrapping nval would be undesirable (as it would prevent sensitive platforms from denouncing the existence of an anonymous cell that should not be). |
Fix argument counting error which cuts the argument capacity in half.
Fix conversion of uninitialized function arguments into arrays.
Plug some leaks and eliminate some uses-after-free.
Hope everyone is doing well,
Miguel