-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsers,lisp: speed up token lookup #3741
Open
dtikhonov
wants to merge
3
commits into
universal-ctags:master
Choose a base branch
from
dtikhonov:speedup-lisp-token
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful for maintenance.
Could you write more about this?
I wonder how to generate the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masatake I put a comment into the code. I think that adding
gperf
to the required toolchain is too much of a pain. Do you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give me time to answer. I have some ideas about this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the build process of ctags, we use two code generators: packcc and optlib2c.
packcc is developed at an external project. With "git subtree," the
source code of packcc is integrated into our source tree. On UNIX-alike
systems, packcc is built before building ctags. packcc is run for
generating .c and .h files. If I understand
correctly, on Windows+msvc , packcc is not built. So parsers requiring
packcc are not built on Windows+msvc. The *.peg files under peg/ are
input for packcc. Generated .c and .h are not in our git repo.
optlib2c is a code generator. It is in our source tree (misc/optlib2c)
and we maintain it as part of ctags. It is written in Perl.
optlib2c CAN generate .c files from optlib/*.ctags files.
We put both .ctags files and generated .c files into our git repo.
So, unlike .peg based parsers, .ctags parsers are built on any platform
including Windows+msvc.
I decided to put .c files derived from .ctags into our git repo because
I know optlib2c generates the same C code for the same .ctags file.
I decided not to put .c and .h files derived from .peg into our git repo because
I should not expect packcc generates the same code for the same .peg file; packcc
is developed at the external project.
Even .ctags parser, there is a limitation. As far as not modifying a .ctags
parser, one can build the .ctags parser even on Windows+msvc. However,
one cannot modify .ctags because the build script for Windows+msvc has no
rule to run optlib2c.
Conclusion:
optlib2c based parsers (.ctags parsers or optlib parsers) can be built
on Windows+msvc unless modified. If modified, it is hard to
build the parser on the platform.
packcc based parsers (.peg parsers) cannot be built on Windows+msvc.
We can enhance the build scripts on Windows+msvc. However, it is
not a topic here.
What should we do about gperf?
In my understanding, gperf is a rather stable program.
So we can expect it generates the same C code for the same input.
So, like optlib2c, we can put generated code into our git source tree.
That means Windows+msvc users can build the gperf based parser.
However, unlike optlib2c and packcc, we don't put the source code
of gperf into our source tree. One wanting to modify a gperf based parser
must install gperf in one's workspace.
So what we have to do:
TODO1. we must choose a version of gperf all developers and contributors use
to ensure the expectation.
TODO2. we must provide a container file (Dockerfile, Contaienrfile),
or a container image to run the version of gperf.
TODO3. we must deploy a detector on our CI environments for comparing
the code generated by the version of gperf and code submitted by
a contributor. If a pull request includes code generated by an unexpected version of gperf installed in the contributor's workspace, merging
should be prevented automatically.
Above TODO items are about our build-infrastructure.
TODO4. A generated code and manually written code must be separated at the file level.
TODO5. An input for gperf (.gperf file) must be in our git repo.
lisp_gperf.c and lisp_def.gperf are the new source code for ctags.
TODO6. source.mak must be extended to accept the new files.
TODO7. The way to update the win32 build script must be provided.
Achieving these items may be possible but challenging.
Thank you for reading, and sorry for my broken English.
I burned out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @masatake, this is well detailed. It is a lot of work: which is why I integrated the generated code by hand.
I think this is too much work for small payoff. How important is it to be able to add keywords easily? Do we think that new Emacs Lisp keywords will be added with some frequency? If not, I think this PR is good enough. We can take the lazy approach: if new keywords are to be added, we can update the mechanism as you outlined above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another critical question. Give me time to answer again. I have too many words about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be late to answer.
Conceptually, keywords will be added frequently.
Let me explain.
I have planned to unify parsers with S expression-based syntaxes like Lisp, EmacsLisp, Scheme, and Clojure for years:
What I have to do may be:
If the way to add a keyword is not sophisticated, it will be an issue in realizing the plan.
This is not only the reason. Developing in-language DSL is a viral technique among Lisp programmers. They may want to make tags for their own
def-something
.See https://docs.ctags.io/en/latest/running-multi-parsers.html?highlight=agent#subparser-tagging-definitions-of-higher-upper-level-language.
Universal Ctags provides a concept, subparser, to satisfy these needs.
If a programmer knows C language, one can add a subparser specialized to one's DSL.
A subparser is a set of callback routines called from a base parser.
Moose parser is an example of subparser. It is called from Perl parser.
The concept of subparser is not enough for real demands.
Only a few people want to write C code for extracting tags from one's DSL input.
Only a few people like me want to pay the cost.
The greatest and the most loved feature of ctags may be --regex-=....
It will be nice if ctags provides a way to define a subparser for a Lisp-based parser from the command line like --Sexp-=...
In this case, a user may define a keyword from the command line.
In these grand stories, it is YES for the question "keywords will be added with some frequency?".
However, what I wrote is not more than a plan. It has not been realized for ... 7 years.
The current development should be limited to plans.
I can only ask you to make a way to add a keyword (1) easy, (2) well-documented, or (3) integrated into our build system.
Thank you for reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. I understand what you are trying to accomplish. I will give this some thought.