Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forced decoding support for partial labelled sequence ? #96

Open
Pantamis opened this issue Jan 19, 2018 · 2 comments
Open

Forced decoding support for partial labelled sequence ? #96

Pantamis opened this issue Jan 19, 2018 · 2 comments

Comments

@Pantamis
Copy link

First, thank you for this wonderful lib !

I think CRFSuite is one of the only lib which can learn different kind of features given the label during training (update different weights during training depending of the label)

I try to use CRF for unusual language data. In particular, some labels are so specific that i can get them simply using regex.
It means that I can have access to the true labels of parts of my sequences even during prediction step.
Wapiti support what they called 'Forced decoding' : https://wapiti.limsi.fr/manual.html#forced
The principle is to improve decoding through the knowledge of true labels by running Viterbi conditionally to inputs and known labels.

I think it could be a really powerful combination for this lib with the feature selection given label during training as I explained to include rules prior on the label sequence in the CRF model.

I wish I could contribute but C is not my cup of tea, can we imagine a such feature for your lib in the future ?

Thank you again for this nice work !

@usptact
Copy link

usptact commented Jan 19, 2018

You might be interested in https://github.com/Oneplus/partial-crfsuite

@Pantamis
Copy link
Author

Pantamis commented Jan 23, 2018

Thank you very much for your answer.

This lib looks also very nice but I think it is not what i was talking about (even if a such feature is very interesting !). Here it uses partially labeled sequences for learning with sequences for which you don't have all the labels. I would like to use the labels eventually known during testing to improve the prediction of a given sequence.

The forced decoding is coded in wapiti by removing some feature before the viterbi decoding : https://github.com/Jekub/Wapiti/blob/569fbe5040583086f8d26667f6b793dc641536b0/src/decoder.c#L183

From what I understand in the code of CRFsuite, the model should store the features' set somewhere (but I didn't really find out where and how for the moment). Removing the features about different labels than the one we are observing temporary for forced decoding should not be too hard.

It could be great to have this feature in CRFsuite !

Maybe I will do it by myself one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants