Vocab Error when running 'Interpreting Classical Text Classification models' Notebook #176

Chris-hughes10 · 2021-07-02T14:00:39Z

Using a clean Python 3.7 environment on Ubuntu, and installing interpret-text using pip, I am hitting an error when I try to walk through the 'Interpreting Classical Text Classification models' notebook; I have made no changes to the code.

When attempting to fit the model, on the line:

classifier, best_params = explainer.fit(X_train, y_train)

I get the following error:

/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:489: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None'
  warnings.warn("The parameter 'token_pattern' will not be used"
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-47f4fc43855d> in <module>
----> 1 classifier, best_params = explainer.fit(X_train, y_train)

/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/classical.py in fit(self, X_str, y_train)
     92         :rtype: list
     93         """
---> 94         X_train = self._encode(X_str)
     95         if self.is_trained is False:
     96             if self.model is None:

/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/classical.py in _encode(self, X_str)
     61         :rtype: array_like (ndarray, pandas dataframe). Same rows as X_str
     62         """
---> 63         X_vec, _ = self.preprocessor.encode_features(X_str)
     64         return X_vec
     65 

/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/common/utils_classical.py in encode_features(self, X_str, needs_fit, keep_ids)
    129         # needs_fit will be set to true if encoder is not already trained
    130         if needs_fit is True:
--> 131             self.vectorizer.fit(X_str)
    132         if isinstance(X_str, str):
    133             X_str = [X_str]

/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit(self, raw_documents, y)
   1167         """
   1168         self._warn_for_unused_params()
-> 1169         self.fit_transform(raw_documents)
   1170         return self
   1171 

/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
   1201 
   1202         vocabulary, X = self._count_vocab(raw_documents,
-> 1203                                           self.fixed_vocabulary_)
   1204 
   1205         if self.binary:

/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
   1131             vocabulary = dict(vocabulary)
   1132             if not vocabulary:
-> 1133                 raise ValueError("empty vocabulary; perhaps the documents only"
   1134                                  " contain stop words")
   1135 

ValueError: empty vocabulary; perhaps the documents only contain stop words

Am I missing something obvious here?

The text was updated successfully, but these errors were encountered:

RitaDS · 2021-10-01T10:31:03Z

Hi @Chris-hughes10 I was having the same problem and it was a problem related with the env. What libraries and versions are you using?

imatiach-msft · 2022-02-02T14:28:42Z

I had a similar issue, using an older version of spacy (2.3.7) package on pypi fixed it, looks like the tokenizer code needs to be updated to latest spacy

imatiach-msft · 2022-02-02T14:28:55Z

see related issue:
#182

imatiach-msft mentioned this issue Feb 2, 2022

Issue with text_classification_classical_text_explainer.ipynb #182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocab Error when running 'Interpreting Classical Text Classification models' Notebook #176

Vocab Error when running 'Interpreting Classical Text Classification models' Notebook #176

Chris-hughes10 commented Jul 2, 2021

RitaDS commented Oct 1, 2021

imatiach-msft commented Feb 2, 2022

imatiach-msft commented Feb 2, 2022

Vocab Error when running 'Interpreting Classical Text Classification models' Notebook #176

Vocab Error when running 'Interpreting Classical Text Classification models' Notebook #176

Comments

Chris-hughes10 commented Jul 2, 2021

RitaDS commented Oct 1, 2021

imatiach-msft commented Feb 2, 2022

imatiach-msft commented Feb 2, 2022