Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ispell pipe mode (enchant-2 -a) #393

Open
rrthomas opened this issue Aug 7, 2024 · 19 comments
Open

Remove ispell pipe mode (enchant-2 -a) #393

rrthomas opened this issue Aug 7, 2024 · 19 comments
Milestone

Comments

@rrthomas
Copy link
Owner

rrthomas commented Aug 7, 2024

Emacs is the only user, and with the jinx package, no longer needs it.

@astoff
Copy link

astoff commented Aug 11, 2024

I'm afraid this is a bit of a misunderstanding. Jinx is just one of the Emacs spell-checking packages, with its own advantages and disadvantages. Note also that any spellchecking package that doesn't rely on the -a interface necessarily has the disadvantage of requiring to link Emacs against some library or compile an external module.

@rrthomas
Copy link
Owner Author

@astoff Thanks for your feedback. It's not a misunderstanding: I wrote the Enchant support for ispell.el and use it daily.

I agree that there are advantages to not needing to compile against an external library, but I also believe these are largely outweighed in the case of Enchant: the pipe method is slow and limits access to Enchant functionality. It also causes me a maintenance burden.

@rrthomas
Copy link
Owner Author

I have filed minad/jinx#199 and started using Jinx myself to explore how far we can go on the Emacs side. Certainly Jinx was easy to set up and use.

@rrthomas rrthomas added this to the v3 milestone Aug 12, 2024
@rrthomas
Copy link
Owner Author

rrthomas commented Oct 9, 2024

It is worth considering the interface of nuspell(1) to copy for a simplified enchant(1).

@astoff
Copy link

astoff commented Oct 9, 2024

What use do you have in mind? IMHO ispell -a is the only that that anyone might care about, and the nuspell command (kinda like but not quite the same) is pretty useless.

@rrthomas
Copy link
Owner Author

rrthomas commented Oct 9, 2024

Mostly diagnostics, also simple batch-mode checking. As you say, not that useful. But I don't think there's much use for pipe-based checking any more.

@astoff
Copy link

astoff commented Oct 9, 2024

Just for the record, I wrote an Emacs package last year that does pipe-based checking and in my book it works very well (in fact, it served as inspiration for Jinx).

Also, apart from figuring out the async communication (which as always was tricky), it is a simpler package than Jinx, and doesn't require compiling a module.

Also, as a design decision, my package doesn't dabble into tokenization questions, which is language dependent, and leaves it entirely to the spellchecker. Because of #316, enchant doesn't work well with my package, so I've been using hunspell instead. (Was that issue solved? I can't reproduce it right now). If I was using enchant, though, I'd protest vehemently :-).

@rrthomas
Copy link
Owner Author

rrthomas commented Oct 9, 2024

Also, as a design decision, my package doesn't dabble into tokenization questions

This is interesting, because I'm not clear that Enchant should be doing this either. While it is reasonable to think that Enchant could do it when fed plain text, I don't think that makes sense for most applications (which aren't editing plain texts), which already have to know where the words are.

@astoff
Copy link

astoff commented Oct 9, 2024

I'm not sure either, but the rules for what constitutes a word are language-dependent (see e.g. /usr/share/hunspell/*.aff) and I would argue that belongs to the spell checker, not to the application using it. There are lots of tricky details, such as when surrounding punctuation is part of the word.

Certainly there is a part of the story that belongs to the application. For example, restrict to comments in source code or filter out markup commands.

@rrthomas
Copy link
Owner Author

rrthomas commented Oct 9, 2024

The problem is that the application will typically also be involved in deciding what is the text that has to be checked, according to non-textual markup. It has to work out all this stuff anyway, e.g. for layout purposes.

@astoff
Copy link

astoff commented Oct 9, 2024

Sure, and that's true even when there's no visual markup, like I said in the second paragraph of my previous message.

But deciding what constitutes a word (or in general any information encoded in the dictionary aff file) is a different story.

@rrthomas
Copy link
Owner Author

rrthomas commented Oct 9, 2024

By "all this stuff", I meant that applications requiring spell-checking will already have to work out where word boundaries are, for layout purposes. So it's not a different story, it's exactly the same story, I'm claiming.

@astoff
Copy link

astoff commented Oct 9, 2024

How is the application supposed to know if a hypen is a word boundary (well-known) or part of a fixed expression (vis-à-vis)? Splitting on whitespace or something like that isn't exactly the same story as splitting words.

@minad
Copy link

minad commented Oct 16, 2024

I think what is needed is some API where the application can ask Enchant about language-dependent tokenization rules. There is already enchant_dict_get_extra_word_characters but this is probably too limited. The bulk of the tokenization work is still better done on the side of the application, since it knows about the relevant context sensitive information, the markup format, comment syntax, etc.

Regarding the pipe mode, I think using it for a just in time Emacs spell checker was the wrong decision, @astoff. The performance is poor, otherwise I wouldn't have made Jinx. In contrast, interacting with the Enchant API is completely painless, except for the module compilation process. But then I think Elisp should provide an FFI at some point which would completely alleviate this pain.

@astoff
Copy link

astoff commented Oct 16, 2024

The performance is poor

Well, I still can't reproduce the poor performance you got, but also I think it would be very sad if there were any fundamental reason for it to be that way. It would mean LSP cannot work smoothly either.

In contrast, interacting with the Enchant API is completely painless, except for the module compilation process.

Sure, I have no doubt about that ;-)

I think what is needed is some API where the application can ask Enchant about language-dependent tokenization rules.

I think that's how most APIs (e.g. Hunspell) work, actually. It seems to me it's not general enough (for example in English you need to expresses that ' is part of a word when inside, but not when around a word). To me it would seem to make more sense if the API allowed the application just split on whitespace. So you can ask if "well-knownx" is correct and the answer would not be yes or no, but rather that there's a typo form characters 5 to 11.

@minad
Copy link

minad commented Oct 17, 2024

Well, I still can't reproduce the poor performance you got, but also I think it would be very sad if there were any fundamental reason for it to be that way. It would mean LSP cannot work smoothly either.

You probably didn't sufficiently profile your package, since jit-spell appeared prominently in the profiles in my test, which is already completely unacceptable for an always-on minor mode, even if not noticeable. Given all these complaints about slow Emacs, the recipe for that are expensive mode lines and slow minor modes, which are always on. I run EXWM and everything runs inside Emacs, so I may be more affected. Also I assume that the effect becomes more noticeable on slower systems. Now contrast this with Jinx, which doesn't add much cost at all to the profile. IPC is simply more expensive than a single C function call plus some FFI marshaling. Why are we even arguing this?

Regarding LSP - LSP in Emacs had massive problems due to the IPC and JSON parsing, this is for sure. See the lsp/eglot-booster projects and also the new JSON parser in Emacs 30. In any case, for LSP we get some benefit by using the IPC protocol due to the various servers and simply because the protocol already exists in this form and cannot be replaced easily. If LSP servers could run inside the host process without the expensive communication one could save some latency, but the downsides would be more significant given the complexity of LSP. In the case of spell checking, the API essentially boils down to a single function to check a word plus a little bit of setup. There the trade-offs are different and Enchant already provides a nice abstraction to embed the spell checker inside the host program with very little hassle. So this is just an apples to oranges comparison with different answers.

Just to be clear regarding this discussion - I am neutral. I have need for the pipe mode but I had assumed that the pipe mode would also be useful for other software, which would speak in favor of keeping it. Also there is jit-spell. Then I've learned that @rrthomas only added the pipe mode for ispell.el integration, so it is up to him to consider removing it again if the maintenance effort is not justifiable.

@astoff
Copy link

astoff commented Oct 17, 2024

This is not a matter of profiling. The issue you observe I can't reproduce, and that's the problem. I did profile quite a bit and say on an Org mode buffer I always I see something like this, which says all is fine:

        1328  71% + redisplay_internal (C function)
         194  10%   Automatic GC
         135   7% + command-execute
         112   6% + org-appear--post-cmd
          40   2% + jit-spell--process-filter
          33   1% + timer-event-handler
           9   0% + winner-save-old-configurations
           2   0% + internal-echo-keystrokes-prefix
           2   0%   internal-timer-start-idle
           2   0%   undo-auto--add-boundary
           1   0%   org-appear--pre-cmd
           1   0%   tooltip-hide
           0   0%   ...

Also I assume that the effect becomes more noticeable on slower systems.

In the absence of further information I would assume a slow system is slower overall. I would be very interested in getting to the bottom of this, because it must be due to some setting or build configuration. But obviously I have no access to your computer to further investigate, neither did I get any similar bug report.

[I also have no complaints regarding LSP — or rather, I do notice some latency but it doesn't noticeably slows down my Emacs, which is in line with my jit-spell experience.]

Concerning the ispell -a interface, I just came here to say that Emacs still uses it. Not with the intent to advocate for anything, but just to let @rrthomas take an informed decision (it turn out it was unnecessary to mention anything, since he was aware of the situation already.)

@minad
Copy link

minad commented Oct 17, 2024

@astoff

This is not a matter of profiling. The issue you observe I can't reproduce, and that's the problem. I did profile quite a bit and say on an Org mode buffer I always I see something like this, which says all is fine:

The profile you showed already demonstrates the problem. It is clearly not fine. If you do the same with Jinx it won't appear with 2% in the profile. 2% is a lot for the process filter only to send a few words back and forth, and it will even get up when scrolling, and the backend process cost is not even included yet. Anyway, I am not interested in discussing this further given your ignorant position. There is simply nothing wrong on my system, nothing to diagnose, nothing to get to the bottom of, except that I don't have the most powerful machine - still no latency or performance issues in my current setup without jit-spell. Look, you know that I am experienced enough given my other packages, long time programming and Emacs development experience, so you could also trust me on this, instead of repeatedly claiming that there is no problem, despite the profile on your machine already showing some impact. I gave your package a fair and long try. In the beginning it worked well, but after short time of usage I found that latency went up unacceptably. I tried to make suggestions regarding how the efficiency could be improved via some rate limiting and queuing to reduce the pressure on the inefficient Emacs process filters, which you were not interested in. Why should I trade this given that Jinx can just do without any increase in latency? Jinx started as an experiment and it immediately solved the problem, otherwise I wouldn't have pursued it further.

[I also have no complaints regarding LSP — or rather, I do notice some latency but it doesn't noticeably slows down my Emacs, which is in line with my jit-spell experience.]

When reading around one will see reports about LSP performance and latency issues. It depends on the server you are using, and I assume that you only use good ones which keep the amount of data over IPC to a minimum. Of course you also have the experience to diagnose given that you've even authored your own server.

@astoff
Copy link

astoff commented Oct 17, 2024

Man, you really went ballistic in that message. I don't quite see what you think you need to defend. It's obvious that a module is way faster than IPC, and Jinx is probably a great package.

We should indeed stop discussing this, but I do want to correct your point that I am

repeatedly claiming that there is no problem

To quote myself, “the issue you observe I can't reproduce, and that's the problem”. Rephrasing more straightforwardly, the problem is that I can't reproduce the issue you observe. Please don't read in that assertion more than is written. I'm just saying that unfortunately I couldn't work with the info you provided me, despite the effort of both parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants