You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the most common things that we have to do, is introduce new words into our monodix, tag them by hand, then insert them in the dix. Given the large amount of words, it becomes a little hard to keep track of them (which one of them are at which stage). Starting from the same source, one word can become stuck at the tagging stage as the user finds it too much difficult to to tag and he needs to check it out with someone else, while some of the words are just one step away from inclusion in the dix). A simple GUI that facilities this workflow would be a good option to reduce the clumsiness and increase productivity.
I know, I know, people has walked along this path over and over again and still there is no particular GUI tool to streamline the workflow of creating/updating the Monodix. In the initial design (2009), we had a mysql-php backed system, but it was not well designed as I did not have the foresight of which type of problems we'd be facing. The database editor that comes with mysql was our primary way of tagging things. It was clumsy in design, but so is creating tons of separate files and manually editing them one by one. Back then the data was inserted into the database, then initially tagged in the database, the exported by a script which converted it to spelling format. Then the spelling format was converted to dix format. The problem with spelling format is it does not handle enclitics, which is a major feature in bn-en pair, thus, spelling format is basically unusable for us.
Here are some ideas for the components for the Graphical Editor:
IMHO, Mysql is really overkill for this. using a small database engine like SQLite or Apache Derby could be part of a better design.
Using spelling as an intermediate format between conversion from database to dix is not an option here, as it does not support enclitics. However, the expand format, that is, output of lt-expand, could be a good candidate.
The GUI could feature a concept of stages: A typical workflow can be broken down to several stages, let's look at this example:
Say we have 100 new words that we need to add to the monodix, all of them are new words and they need to be tagged.
We import them into the GUI, now all of them are in stage 1.
We successfully categorize them between nouns, pronouns etc. but there are 20 words that we cannot decide right now. What do we do?
We prove the successful 80 words to stage 2, while 20 remains in stage 1, When We come across new words, these 20 will be shown in the list then again, or user can manually get back to check out the words in stage 1
At stage 2, according to the POS categorization of words, the user will be given option to work on the various infections of the words. One, easy feature could be: the user is offered to use all the existing paradigms as templates and the outputs (all the inflections) are generated in real time, user will be able to see the output and then edit them right there.
If the uses the template (paradigm) unchanged, then that paradigm will be associated with the word, otherwise, the new word becomes it's own paradigm
Also, if the user mistakenly select a template, and edits that, resulting in a paradigm that does the same work as an existing paradigm, the program must be able to find out the duplicate paradigms. @ftyers has a script called paradigm_chopper.py for that, which we could recycle.
Depending on the word and POS, there cold be stage 3 or more.
@ftyers These are some basic ideas that is coming to my mind, feel free to extend them. And if you find across a tool that already does what I said, I could take a look. Personally, the tagging work is boring and monotonous for me, I want to be able to make it easy for anyone who wishes to contribute to language pair without going through much hassle. I have talked to a couple of friends of mine, they said they were very interested in contributing, but as they are not programmers, I know they won't be able to contribute if I cannot provide them with an easy to use solution.
The text was updated successfully, but these errors were encountered:
One of the most common things that we have to do, is introduce new words into our monodix, tag them by hand, then insert them in the dix. Given the large amount of words, it becomes a little hard to keep track of them (which one of them are at which stage). Starting from the same source, one word can become stuck at the tagging stage as the user finds it too much difficult to to tag and he needs to check it out with someone else, while some of the words are just one step away from inclusion in the dix). A simple GUI that facilities this workflow would be a good option to reduce the clumsiness and increase productivity.
I know, I know, people has walked along this path over and over again and still there is no particular GUI tool to streamline the workflow of creating/updating the Monodix. In the initial design (2009), we had a mysql-php backed system, but it was not well designed as I did not have the foresight of which type of problems we'd be facing. The database editor that comes with mysql was our primary way of tagging things. It was clumsy in design, but so is creating tons of separate files and manually editing them one by one. Back then the data was inserted into the database, then initially tagged in the database, the exported by a script which converted it to spelling format. Then the spelling format was converted to dix format. The problem with spelling format is it does not handle enclitics, which is a major feature in bn-en pair, thus, spelling format is basically unusable for us.
Here are some ideas for the components for the Graphical Editor:
paradigm_chopper.py
for that, which we could recycle.@ftyers These are some basic ideas that is coming to my mind, feel free to extend them. And if you find across a tool that already does what I said, I could take a look. Personally, the tagging work is boring and monotonous for me, I want to be able to make it easy for anyone who wishes to contribute to language pair without going through much hassle. I have talked to a couple of friends of mine, they said they were very interested in contributing, but as they are not programmers, I know they won't be able to contribute if I cannot provide them with an easy to use solution.
The text was updated successfully, but these errors were encountered: