You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our chunker data is derived from the GENIA treebank corpus. However, this corpus has complete nested constituencies instead of just chunks. So we use an algorithm to create the chunks out of the treebank. For this there are currently two algorithms in the jcore-base version of the opennlp chunker. I think the newer one works better than the old one but it is still not perfect.
Now I found these data in our internal file system: /archives/alumni_homes/tomanek/coling/corpora/Genia/chunks/genia_new.chunks.gz
This appears to be the GENIA conversion used originally within the JULIE Lab. We should do crossevaluations on both corpora to see if there tagging differences and also just a plain comparison. Perhaps the old data is better.
The text was updated successfully, but these errors were encountered:
Our chunker data is derived from the GENIA treebank corpus. However, this corpus has complete nested constituencies instead of just chunks. So we use an algorithm to create the chunks out of the treebank. For this there are currently two algorithms in the jcore-base version of the opennlp chunker. I think the newer one works better than the old one but it is still not perfect.
Now I found these data in our internal file system: /archives/alumni_homes/tomanek/coling/corpora/Genia/chunks/genia_new.chunks.gz
This appears to be the GENIA conversion used originally within the JULIE Lab. We should do crossevaluations on both corpora to see if there tagging differences and also just a plain comparison. Perhaps the old data is better.
The text was updated successfully, but these errors were encountered: