You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using a clean Python 3.7 environment on Ubuntu, and installing interpret-text using pip, I am hitting an error when I try to walk through the 'Interpreting Classical Text Classification models' notebook; I have made no changes to the code.
/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:489: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None'
warnings.warn("The parameter 'token_pattern' will not be used"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-47f4fc43855d> in <module>
----> 1 classifier, best_params = explainer.fit(X_train, y_train)
/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/classical.py in fit(self, X_str, y_train)
92 :rtype: list
93 """
---> 94 X_train = self._encode(X_str)
95 if self.is_trained is False:
96 if self.model is None:
/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/classical.py in _encode(self, X_str)
61 :rtype: array_like (ndarray, pandas dataframe). Same rows as X_str
62 """
---> 63 X_vec, _ = self.preprocessor.encode_features(X_str)
64 return X_vec
65
/anaconda/envs/interpret/lib/python3.7/site-packages/interpret_text/experimental/common/utils_classical.py in encode_features(self, X_str, needs_fit, keep_ids)
129 # needs_fit will be set to true if encoder is not already trained
130 if needs_fit is True:
--> 131 self.vectorizer.fit(X_str)
132 if isinstance(X_str, str):
133 X_str = [X_str]
/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit(self, raw_documents, y)
1167 """
1168 self._warn_for_unused_params()
-> 1169 self.fit_transform(raw_documents)
1170 return self
1171
/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
1201
1202 vocabulary, X = self._count_vocab(raw_documents,
-> 1203 self.fixed_vocabulary_)
1204
1205 if self.binary:
/anaconda/envs/interpret/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
1131 vocabulary = dict(vocabulary)
1132 if not vocabulary:
-> 1133 raise ValueError("empty vocabulary; perhaps the documents only"
1134 " contain stop words")
1135
ValueError: empty vocabulary; perhaps the documents only contain stop words
Am I missing something obvious here?
The text was updated successfully, but these errors were encountered:
I had a similar issue, using an older version of spacy (2.3.7) package on pypi fixed it, looks like the tokenizer code needs to be updated to latest spacy
Using a clean Python 3.7 environment on Ubuntu, and installing interpret-text using pip, I am hitting an error when I try to walk through the 'Interpreting Classical Text Classification models' notebook; I have made no changes to the code.
When attempting to fit the model, on the line:
classifier, best_params = explainer.fit(X_train, y_train)
I get the following error:
Am I missing something obvious here?
The text was updated successfully, but these errors were encountered: