-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transitioning continuously between dictation and commands #623
Comments
That sounds useful. However things are about to change with how we do CCRMerging.
To clarify, So this would be a CCR grammar that's not merged into the global grammar? |
I think what @alexboche is talking about is, essentially, being able to mix dictation and CCR commands. AFAIK, this is strictly not possible with Dragon, at least not with the way that Dragonfly grammars currently work. The only way to do it would be if Caster were to not use Dragon's command mode at all, capture ALL dictation output from Dragon, and then parse it to try to figure out which words are commands and which are dictation. This is how Nils Klarlund's ShortTalk worked. It can work, but comes with a lot of challenges, both technical and behavioral (such as your command words need to be completely foreign, not just obscure, like Ross, lease, sauce, dunce, etc). As much as I would love to be able to mix commands in with dictation, I feel like the amount of work it would require just isn't worth it. |
I was thinking a separate CCR grammar that is not merged into the main global CCR grammar.the separate CCR grammar could be global or context specific (definitely something that could be enabled and disabled). This separate ccr grammar would just have a small number of commands in it, ideally with words that are not in the Dragon vocabulary. the command would basically be command for text navigation, e.g. arrow keys (with modifiers), maybe the clipboard based text navigation commands. it might be possible to use a mimic command to get the native Dragon text manipulation commands to work CCR with respect to words preceding them (not words following them). Might not be worth the effort. Just something to think about. If you feel like this is a no go, feel free to close the issue. Alternatively, it might be good to make some more commands like caster's "format " to make it easier to get the proper spacing before and after,get punctuation and capitalization etc. E.g. perhaps a command like "sentence ", "spay " (puts a space before the dictation), "comma list " (puts commas between all the words) |
@alexboche Thanks for clarifying the scenario with CCR. @chilimangoes Interesting I wonder if @quintijn would have some knowledge with Natlink that could be leveraged to make this happen in DNS without postprocessing free dictation? |
Could you include a demo grammar with implementation with a few other commands mixed in? |
I may be misunderstanding, but isn't this what the
Ending the dictation and transitioning back to commands, e.g. |
@LexiconCode I seem to remember reading somewhere that this is a fundamental limitation of the functionality that is exposed to Natlink by Dragon, although I could be wrong. @mrob95 the way I understood the proposal was, he would like to be able to say something like "Mary had a little lamb comma who's fleece was blue queue lease white as snow" as a single utterance, and without the initial "format" keyword that makes Dragon recognize it as a command, and have the following output:
Followed by selecting the word blue and replacing it with "white as snow" to get the following:
After thinking about this overnight, I might be warming to the, if it's not too buggy. Dragon would essentially be in permanent command mode at that point, since you would have a catch-all rule. This might even need to be the default way we process dictation, once some of the open source speech backends get to the point where they're serious alternatives to Dragon. However, in order to more intelligently handle spacing and capitalization when doing standard dictation, we would want to use the new accessibility based text manipulation in Dragonfly rather than the blind output Caster currently does. Although, TBH, I think this last bit is the right direction anyway. |
That is the future and will drastically simplify the process of existing manipulating text. This pull request gives people the chance to utilize it with caster. However I closed because it wasn't simple way to integrate dependencies. |
@alexboche I have decoupled the new CCRMerger from the rules it produces. So shortly, there will be nothing to stop you from making another CCRMerger instance and making more "global" type rules. |
So here's a (Edit: new an simpler) sample grammar provided by David Zurow. This will work if you just put it right into a new file in caster (or anywhere in macro system I think). This would probably not be merged with other grammars.
I'm going to be testing this out in the wild of it to see how it goes. A first issue with this is that when this grammar is active, all dictation is passed-through it which I think means that it does not have lower priority then other commands (unless we tweak the priorities ourselves). This contrast the usual situation in Dragon where commands are prioritized over dictation. We should probably figure out how to manually adjust the priorities of different grammars because e.g. we might want to turn down the priority of this grammar so that commands do not instead get accidentally interpreted as dictation, but this doesn't seem to be much of a problem so far (will have to monitor it) . A second issue is spacing (and perhaps other formatting) . With this approach, because dragon's dictation is passing through text action, some of the formatting in the Dragon vocabulary seems to sometimes but not always get lost in translation. I don't quite understand how this works yet. here are two ways that this spacing issue manifests. A third issue is that the very useful command "scratch that" and commands built on top of it such as "make that " don't work with this text action approach. Perhaps we could create our own version of "scratch that". |
Setting priorities for grammars/rules could be possible for Kaldi & WSR. It wouldn't be hard for Kaldi; for the WSR backend, however, it would require a large reworking. I don't know about natlink. |
I think setting priorities of command grammars in Dragon/Natlink is not possible. I am also not in favour of it. I think duplicate recognitions should be avoided. |
The spacing/capitalisation issue, as normally done by Dragon in "supported windows" is simulated in the natlink module nsformat.py. This behaviour is also used for utterances in the same "unsupported" window. The trick is, that the state of the previous utterance is kept for the next. User grammars can use this same mechanism. |
(Edited in response to daanzu's helpful suggestion.)
Full code is here |
I'm not discouraging anyone from working with grammar prioritization weights. As an alternative to modifying priorities for grammars which may not be compatible with all current or future Backends consider the following. Included as a command. This is great for short dictation. A dictation mode which can be enabled and disabled as a ccr.(We can toggle off other grammars selectively). Great for longform dictations. Based on the running Backend we can enable enhanced functionality such as dictation based on grammar prioritization as supported. |
@quintijn @alexboche Thanks for the info about natlink formatting and the gist with the code. Regarding setting priorities for grammars/rules, it would affect not only conflicting/ambiguous rules with duplicate specs, but also ones with differing specs. In the latter case, it would allow you to combat instances where certain phrases are recognized more often than they should be, but you don't want to change their spec or disable them entirely. It simply makes the engine less likely to "take that path" during recognition. |
@alexboche I think you may want |
@alexboche May I assume you got the nsformat thing work? In the voicecode project, yes, latest update from 2006, grammars were completely based on catching dictate. If users want to really delve into this, it should be worth the trouble in reviving this code. Also extensive select and say grammars were designed, I worked on it at the time. Results were spectacular! But... quite a challenge to dive into this, but it is still all there!! Implementation for emacs. Go to sourceforge.net, search for voicecode and download the latest release. |
@quintijn No I did not get the nsformat thing to work. Nsformat is returning the text with spaces in between virtually every letter, three spaces between words, and no space between utterances. I was expecting that the value returned by nsformat.formatWords() would be the properly formatted dictation string. I may explore the voice code thing (thank you for the tip), but if we can get this working without that would probably be easier. |
@alexboche do you pass in the LIST of words that were recognized by the rule? ["a", "list", "of", "words"] instead of "a list of words" |
Quintijn is correct thank you. nsformat.formatWords() must be given a list of words not a string of the words joined together. The list of words can be accessed by using
Full working code here: |
Further discussion of formatting and potentially the entire topic of this issue will be henceforth discussed in the dragonfly issue here |
Even though you can transition from commands to dictation by using a command like
"format <dictation>"
, there is currently no way to transition from dictation to commands without pausing. It would be nice if you didn't have to pause. Allowing too many commands to be active while dictating would cause misfires, but it would be good to have the ability to say a few commands that are frequently used while dictating. an example of such a command would be moving to the left/right n words. Such commands would probably need to have names that are not similar to any commonly used words.One way to do this is to filter all commands through a ccr command using a text action like
"<dictation>": Text("%(dictation)s")
. (there is probably a better way, but I have tested this briefly and it works.)LexiconCode mentioned that it is possible to make separate CCR grammars that are not merged with each other. So there could be a separate CCR grammar that has the dictation command just mentioned and a small number of other commands for moving the cursor and so on. (I have not attempted to test this part.)
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: