Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 1.08 KB

README.md

File metadata and controls

21 lines (17 loc) · 1.08 KB

Synthetic handwritten Groningen Meaning Bank (GMB) dataset

Dataset of synthetically generated handwritten pages intended for research on full page text and entity recognition. The data was generated using the tool in https://github.com/manucarbonell/handwritten-document-synthesizer and data taken from https://gmb.let.rug.nl/.

This dataset was developed for the following paper. If you use this dataset in your research, please cite the origin of the data The Groningen Meaning Bank and cite the paper:

Manuel Carbonell, Alicia Fornés, Mauricio Villegas, and Josep Lladós. "A
neural model for text localization, transcription and named entity
recognition in full pages." Pattern Recognition Letters 136 (2020): 219-227.

Use nw-page-editor to visualize the xmls. To get a nicer visualization of the annotated entities load the css included in this repository as follows: nw-page-editor --css code/nw-page-editor-entities.css data.