Add sequence_tagging example #302

gpengzhi · 2020-03-03T04:37:43Z

Adapted from sequence_tagging in texar-tf.

codecov · 2020-03-03T05:00:59Z

Codecov Report

Merging #302 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #302   +/-   ##
=======================================
  Coverage   79.76%   79.76%           
=======================================
  Files         133      133           
  Lines       11122    11122           
=======================================
  Hits         8872     8872           
  Misses       2250     2250

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 520018c...2c6dde0. Read the comment docs.

examples/sequence_tagging/README.md

huzecong · 2020-03-03T17:41:32Z

examples/sequence_tagging/conll_reader.py

+
+import texar.torch as tx
+
+# pylint: disable=redefined-outer-name, unused-variable


Why are these necessary? I can imagine unused-variable being required for the constants, but why redefined-outer-name? It's also better practice to add a corresponding pylint: enable comment after where it's no longer required.

huzecong · 2020-03-03T17:41:45Z

examples/sequence_tagging/conll_reader.py

+DIGIT_RE = re.compile(r"\d")
+
+
+def create_vocabs(train_path, dev_path, test_path, normalize_digits=True,


Please add type annotations for functions.

huzecong · 2020-03-03T17:42:54Z

examples/sequence_tagging/ner.py

+# Prepares/loads data
+if config.load_glove:
+    print('loading GloVe embedding...')
+    glove_dict = load_glove(embedding_path, EMBEDD_DIM)
+else:
+    glove_dict = None
+
+(word_vocab, char_vocab, ner_vocab), (i2w, i2n) = create_vocabs(
+    train_path, dev_path, test_path, glove_dict=glove_dict)
+
+data_train = read_data(train_path, word_vocab, char_vocab, ner_vocab)
+data_dev = read_data(dev_path, word_vocab, char_vocab, ner_vocab)
+data_test = read_data(test_path, word_vocab, char_vocab, ner_vocab)
+
+scale = np.sqrt(3.0 / EMBEDD_DIM)
+word_vecs = np.random.uniform(
+    -scale, scale, [len(word_vocab), EMBEDD_DIM]).astype(np.float32)
+if config.load_glove:
+    word_vecs = construct_init_word_vecs(word_vocab, word_vecs, glove_dict)
+
+scale = np.sqrt(3.0 / CHAR_DIM)
+char_vecs = np.random.uniform(
+    -scale, scale, [len(char_vocab), CHAR_DIM]).astype(np.float32)


Consider moving these into main as well.

examples/sequence_tagging/ner.py

huzecong · 2020-03-03T17:44:09Z

examples/sequence_tagging/conll_reader.py

+    return word_vecs
+
+
+class CoNLLReader:


Can we modify this to use tx.data.Dataset? This would eliminate the need for separate read_data and iterate_batch methods.

huzecong · 2020-03-03T17:44:39Z

examples/sequence_tagging/conll_reader.py

+                           ner_tags, ner_ids)
+
+
+class NERInstance:


This and Sentence below can be changed to NamedTuples, which also supports custom methods.

huzecong · 2020-03-03T17:45:21Z

examples/sequence_tagging/conll_reader.py

+        yield wid_inputs, cid_inputs, nid_inputs, masks, lengths
+
+
+def load_glove(filename, emb_dim, normalize_digits=True):


I thought we had a load_glove method inside tx.data? What are the differences here?

huzecong · 2020-03-03T17:46:16Z

examples/sequence_tagging/ner.py

+        self.dense_2 = nn.Linear(in_features=config.tag_space,
+                                 out_features=len(ner_vocab))
+
+    def forward(self, inputs, chars, targets, masks, seq_lengths, mode):


Type annotations.

huzecong · 2020-03-03T17:50:48Z

examples/sequence_tagging/conll_writer.py

+    def start(self, file_path):
+        self.__source_file = open(file_path, 'w', encoding='utf-8')
+
+    def close(self):
+        self.__source_file.close()


We could change these to __enter__ and __exit__ and use the with writer.open(path) as f context manager pattern. It's also fine to keep it as is if you're more comfortable with this.

Add sequence_tagging example

7efd244

gpengzhi requested a review from huzecong March 3, 2020 04:37

huzecong requested changes Mar 3, 2020

View reviewed changes

Merge branch 'master' into sequence_tagging

2c6dde0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence_tagging example #302

Add sequence_tagging example #302

gpengzhi commented Mar 3, 2020

codecov bot commented Mar 3, 2020 •

edited

Loading

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020

huzecong Mar 3, 2020


		import texar.torch as tx

		# pylint: disable=redefined-outer-name, unused-variable

		DIGIT_RE = re.compile(r"\d")


		def create_vocabs(train_path, dev_path, test_path, normalize_digits=True,

		yield wid_inputs, cid_inputs, nid_inputs, masks, lengths


		def load_glove(filename, emb_dim, normalize_digits=True):

Add sequence_tagging example #302

Are you sure you want to change the base?

Add sequence_tagging example #302

Conversation

gpengzhi commented Mar 3, 2020

codecov bot commented Mar 3, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 3, 2020 •

edited

Loading