Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panning for a Gui Tool for Handling of Monodix #1

Open
azmfaridee opened this issue May 26, 2012 · 1 comment
Open

Panning for a Gui Tool for Handling of Monodix #1

azmfaridee opened this issue May 26, 2012 · 1 comment
Assignees

Comments

@azmfaridee
Copy link
Owner

One of the most common things that we have to do, is introduce new words into our monodix, tag them by hand, then insert them in the dix. Given the large amount of words, it becomes a little hard to keep track of them (which one of them are at which stage). Starting from the same source, one word can become stuck at the tagging stage as the user finds it too much difficult to to tag and he needs to check it out with someone else, while some of the words are just one step away from inclusion in the dix). A simple GUI that facilities this workflow would be a good option to reduce the clumsiness and increase productivity.

I know, I know, people has walked along this path over and over again and still there is no particular GUI tool to streamline the workflow of creating/updating the Monodix. In the initial design (2009), we had a mysql-php backed system, but it was not well designed as I did not have the foresight of which type of problems we'd be facing. The database editor that comes with mysql was our primary way of tagging things. It was clumsy in design, but so is creating tons of separate files and manually editing them one by one. Back then the data was inserted into the database, then initially tagged in the database, the exported by a script which converted it to spelling format. Then the spelling format was converted to dix format. The problem with spelling format is it does not handle enclitics, which is a major feature in bn-en pair, thus, spelling format is basically unusable for us.

Here are some ideas for the components for the Graphical Editor:

  • IMHO, Mysql is really overkill for this. using a small database engine like SQLite or Apache Derby could be part of a better design.
  • Using spelling as an intermediate format between conversion from database to dix is not an option here, as it does not support enclitics. However, the expand format, that is, output of lt-expand, could be a good candidate.
  • The GUI could feature a concept of stages: A typical workflow can be broken down to several stages, let's look at this example:
    • Say we have 100 new words that we need to add to the monodix, all of them are new words and they need to be tagged.
    • We import them into the GUI, now all of them are in stage 1.
    • We successfully categorize them between nouns, pronouns etc. but there are 20 words that we cannot decide right now. What do we do?
    • We prove the successful 80 words to stage 2, while 20 remains in stage 1, When We come across new words, these 20 will be shown in the list then again, or user can manually get back to check out the words in stage 1
    • At stage 2, according to the POS categorization of words, the user will be given option to work on the various infections of the words. One, easy feature could be: the user is offered to use all the existing paradigms as templates and the outputs (all the inflections) are generated in real time, user will be able to see the output and then edit them right there.
    • If the uses the template (paradigm) unchanged, then that paradigm will be associated with the word, otherwise, the new word becomes it's own paradigm
    • Also, if the user mistakenly select a template, and edits that, resulting in a paradigm that does the same work as an existing paradigm, the program must be able to find out the duplicate paradigms. @ftyers has a script called paradigm_chopper.py for that, which we could recycle.
    • Depending on the word and POS, there cold be stage 3 or more.

@ftyers These are some basic ideas that is coming to my mind, feel free to extend them. And if you find across a tool that already does what I said, I could take a look. Personally, the tagging work is boring and monotonous for me, I want to be able to make it easy for anyone who wishes to contribute to language pair without going through much hassle. I have talked to a couple of friends of mine, they said they were very interested in contributing, but as they are not programmers, I know they won't be able to contribute if I cannot provide them with an easy to use solution.

@ghost ghost assigned ragib06 May 26, 2012
@azmfaridee
Copy link
Owner Author

@ftyers mentioned to check this out: http://wiki.apertium.org/wiki/User:Dtr5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants