-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token spans for non-AST nodes #61
Comments
I haven't tried it myself but you might be interested in https://github.com/Instagram/LibCST |
For attribute lookup, the For brackets, that's a different issue. Haven't looked into it, but it reminds me of a somewhat similar needs involving parentheses: #11 |
It's not so much "making it simpler" as avoiding hardcoding all of these rules -- they should mostly be just as simple, but they start piling up. For example, alias nodes are similarly simple -- as it's constrained to the form It'd be nice to centralize all of these simple rules inside asttokens, although I can still write them by hand in my library-that-uses-asttokens if need be. (This is also true of use case 2 / brackets.)
They're definitely related. A motivating use case for this span is to e.g. replace If the resolution for #11 is to include parens, then the |
Sorry, I forgot to reply to this. I'd be open to (additionally) using LibCST, but I really like how asttokens is built on the existing AST, and would prefer something like LibCST that layers itself on top of the AST. (Which is kind of what I view asttokens as, though it's missing many desirable things such as this FR, or knowledge of parenthesis-expressions.) |
So you mean a helper like `get_span(attrNode, “attr”)` or
`get_span(tupleNode, “elts”)`? That sounds good to me, and seems helpful to
include into the asttokens library (rather than a separate one on top of
it).
To handle empty spans, perhaps better to return (startTok, endTok), where
the endTok is AFTER the last token? That would allow covering an empty
token span, while still exposing a relevant position in the text.
…On Tue, Sep 15, 2020 at 3:48 PM Devin Jeanpierre ***@***.***> wrote:
I haven't tried it myself but you might be interested in
https://github.com/Instagram/LibCST
Sorry, I forgot to reply to this.
I'd be open to (additionally) using LibCST, but I *really* like how
asttokens is built on the existing AST, and would prefer something like
LibCST that layers itself on top of the AST. (Which is kind of what I view
asttokens as, though it's missing many desirable things such as this FR, or
knowledge of parenthesis-expressions.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#61 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIKMR6ZYOTPDXSW5LCI2ZLSF7AH7ANCNFSM4RMJCKDA>
.
|
(Replying out of order)
Oh, actually, yeah, that should work perfectly! My only concern is that this is totally different from
Yes, exactly right! Only catch is that it can't be a top-level function, I think it needs to either be a method, or accept an ASTTokens object as a parameter, so that it can call next_token(). For example, the implementation for return (self.next_token(container_node.first_token), container_node.last_token) If this plan looks good to everyone, I can volunteer to implement this for several easy cases and send a PR implementing those -- at the least |
This sounds fine to me. I did a little thinking to clarify the goals for this. Such rules (e.g. for getting parts of Attribute or List nodes) only make sense in a particular context, and would usually occur along with other hard-coded logic specific to this context. One needs to deal with a specific kind of node to make use of But there are other benefits:
There are some related needs for which it would be useful to have a method with similar usage. E.g. imagine that Interface-wise, separate methods may make more sense (e.g. |
If by generic usage you mean like, usage that is not aware of specific AST object type, that is still possible. For example, if you were writing some tool that dumped the AST and information about token spans for debugging, you might write something like this: def dump_ast(astt, node):
print(ast.dump(node))
print(f"{node.first_token.startpos}-{node.last_token.endpos}")
for field, child in ast.iter_fields(node):
child_span = astt.get_span(node, field)
if child_span is not None:
print(f" {field}: {span[0].startpos}-{span[1].startpos}")
else:
print(f" {field} (no span)") But yeah, there isn't much use for this if you don't know what the type is -- the use case is generally that you want to do some specific transform. Without that, there is not as much use for token data, and it could really be arbitrary or missing.
I really want that There is a corner case where an AST node has a list of strings, instead of a list of nodes, where you might want to know the span of one of those substrings -- for example,
TBH I worry that there'd be far too many as things get added over time. If this is the way to go, I'd suggest a separate module and accepting the It also starts getting worrisome if two different AST objects share attributes with the same name but different semantics (idk if that happens already or not). One could imagine monkeypatching them onto the AST nodes, much like
This has the property of being naturally grouped and avoiding magic string constants. Explosion of possibilities here, and I don't mean to bikeshed too much. As long as we're clear on the basic approach, I can start implementing, and am happy to work out the relatively superficial detail of exactly how the API looks concurrently. |
Looks like I can't edit the issue, but feel free to assign to me. |
I assigned to you. You make a good point about Thanks for the analysis of issues and possibilities. They all make sense, and I agree that it seems fine to start with something and iterate. Thanks for tackling this! :) |
The main pain point I have with asttokens is when there's a non-AST object inside of the AST that semantically has a token span, but which asttokens does not expose (to the best of my knowledge). Two examples:
If you match
foo.bar
, what is the token span forbar
? Unfortunately, thebar
is a string object in the AST (Attribute.attr
) and gets nofirst_token
/last_token
attributes. (Similar issues exist foralias
nodes; see Strange behavior with tokens of import nodes #27, which describes an attempted workaround) and other parts of the AST.) This makes it harder to replace one attribute with another, or change one import to another, things like that, without replacing the entireexpr
/stmt
.If you match
[]
, and want to insert a new element to the list literal, you need to manually do the math, as there is no declared token span for theelts
attribute, which is just an empty list. (For nonempty lists, you can use the token spans of the members of the list.)Feature request: I would like to define a method on
ASTTokens
, something likeget_span(node: AST, attr: str) -> Optional[(Token, Token)]
, which returns thefirst_token
/last_token
of an attribute, instead of of the node itself, as a best effort attempt. These would be either written down explicitly inmark_tokens
off to the side (as e.g. a different attribute), or deduced on the fly.Also, I would like some way to mark these inclusive/exclusive -- for example, in the empty list case, the token span for
elts
should be identical to the token span for the container, but exclusive of the[
and]
etc. Maybe aninclusive
attribute on Token, or maybe just an extra pair ofbool
return values forget_span
, something like this.Context links: (1) is the main motivator for this limitation in my asttokens-using tool, and (2) will become a problem in ssbr/refex#6 (support for globs, like
[$x...]
, which includes the empty list).(If this sounds good, I can volunteer to do at least enough to make this work for the empty list case and probably the function call case, which I will need, and maybe some trivial easy ones like
Attribute.attr
. I don't want to try to solve it for everything though. :))The text was updated successfully, but these errors were encountered: