All data filtered with "no label" for specter+MFR #65

neubig · 2021-05-16T11:50:27Z

Hello,

I was able to successfully use reviewer assignment with specter+MFR last time I ran it, but now I'm getting the following error where it seems that all data is being filtered out ("Remove 2099 empty items with no label"), causing the software to die due to the empty dataset. Notably this doesn't happen when I use ELMo. Here are the JSON config files for reference:

I'll also try to debug this myself, but I was wondering if you could give any guidance?

MFR:
Featurizing publications...
217330 tokens before filtering <null>
Finish loading 2099 sentences. While truncating 0 long sentences
Finish loading 2099 sentences. While truncating 0 long sentences
2148 tokens before filtering <null>
2148 tokens before filtering <null>
Remove 2099 empty items with no label
Loading data
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/work/openreview-expertise/expertise/run.py", line 145, in <module>
    mfr_publications_path=None, skip_specter=config['model_params'].get('skip_specter', False))
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/ensemble.py", line 60, in embed_publications
    self.mfr_predictor.embed_publications(mfr_publications_path)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/multifacet_recommender.py", line 667, in embed_publications
    1, copy_training=True, load_val=False)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 463, in load_corpus
    dataloader_train_arr, max_sent_len_train = create_data_loader_split(f_in, train_bsz, device, split_num, copy_training)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 245, in create_data_loader_split
    dataloader_arr = [torch.utils.data.DataLoader(dataset_arr[i], batch_size = bsz, shuffle = True, pin_memory=not use_cuda, drop_last=False) for i in range(split_num)]
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 245, in <listcomp>
    dataloader_arr = [torch.utils.data.DataLoader(dataset_arr[i], batch_size = bsz, shuffle = True, pin_memory=not use_cuda, drop_last=False) for i in range(split_num)]
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
    sampler = RandomSampler(dataset)
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

The text was updated successfully, but these errors were encountered:

neubig · 2021-05-16T13:44:26Z

OK, I found the issue! I was currently running assignment for ACs in the same directory in which I ran reviewer assignment. However, the user dictionary from the reviewer assignment was still cached in the mfr directory, and the model was reading in the reviewer names instead of the AC names. Deleting the mfr directory fixed the issue.

It might be nice to add a sanity check to the code to make sure that all the reviewers are actually found in the user dictionary and throw a comprehensible error (e.g. suggesting deleting the mfr directory or something else) if not.

melisabok · 2021-05-16T13:48:44Z

Good catch. I always create a different directory per group id. Let's keep the issue open and add the sanity check.

melisabok assigned purujitgoyal Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All data filtered with "no label" for specter+MFR #65

All data filtered with "no label" for specter+MFR #65

neubig commented May 16, 2021

neubig commented May 16, 2021

melisabok commented May 16, 2021

All data filtered with "no label" for specter+MFR #65

All data filtered with "no label" for specter+MFR #65

Comments

neubig commented May 16, 2021

neubig commented May 16, 2021

melisabok commented May 16, 2021