Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All data filtered with "no label" for specter+MFR #65

Open
neubig opened this issue May 16, 2021 · 2 comments
Open

All data filtered with "no label" for specter+MFR #65

neubig opened this issue May 16, 2021 · 2 comments
Assignees

Comments

@neubig
Copy link

neubig commented May 16, 2021

Hello,

I was able to successfully use reviewer assignment with specter+MFR last time I ran it, but now I'm getting the following error where it seems that all data is being filtered out ("Remove 2099 empty items with no label"), causing the software to die due to the empty dataset. Notably this doesn't happen when I use ELMo. Here are the JSON config files for reference:

I'll also try to debug this myself, but I was wondering if you could give any guidance?

MFR:
Featurizing publications...
217330 tokens before filtering <null>
Finish loading 2099 sentences. While truncating 0 long sentences
Finish loading 2099 sentences. While truncating 0 long sentences
2148 tokens before filtering <null>
2148 tokens before filtering <null>
Remove 2099 empty items with no label
Loading data
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/work/openreview-expertise/expertise/run.py", line 145, in <module>
    mfr_publications_path=None, skip_specter=config['model_params'].get('skip_specter', False))
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/ensemble.py", line 60, in embed_publications
    self.mfr_predictor.embed_publications(mfr_publications_path)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/multifacet_recommender.py", line 667, in embed_publications
    1, copy_training=True, load_val=False)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 463, in load_corpus
    dataloader_train_arr, max_sent_len_train = create_data_loader_split(f_in, train_bsz, device, split_num, copy_training)
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 245, in create_data_loader_split
    dataloader_arr = [torch.utils.data.DataLoader(dataset_arr[i], batch_size = bsz, shuffle = True, pin_memory=not use_cuda, drop_last=False) for i in range(split_num)]
  File "/home/ubuntu/work/openreview-expertise/expertise/models/multifacet_recommender/mfr_src/utils.py", line 245, in <listcomp>
    dataloader_arr = [torch.utils.data.DataLoader(dataset_arr[i], batch_size = bsz, shuffle = True, pin_memory=not use_cuda, drop_last=False) for i in range(split_num)]
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
    sampler = RandomSampler(dataset)
  File "/home/ubuntu/anaconda3/envs/affinity/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
@neubig
Copy link
Author

neubig commented May 16, 2021

OK, I found the issue! I was currently running assignment for ACs in the same directory in which I ran reviewer assignment. However, the user dictionary from the reviewer assignment was still cached in the mfr directory, and the model was reading in the reviewer names instead of the AC names. Deleting the mfr directory fixed the issue.

It might be nice to add a sanity check to the code to make sure that all the reviewers are actually found in the user dictionary and throw a comprehensible error (e.g. suggesting deleting the mfr directory or something else) if not.

@melisabok
Copy link
Member

Good catch. I always create a different directory per group id. Let's keep the issue open and add the sanity check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants