Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meaning of min_freq parameter #4

Closed
marctorsoc opened this issue Oct 26, 2018 · 4 comments
Closed

meaning of min_freq parameter #4

marctorsoc opened this issue Oct 26, 2018 · 4 comments

Comments

@marctorsoc
Copy link

This question is not really related to your R code, but maybe you know the answer so let's try :)

was wondering what's the meaning of min_freq param, as in the documentation says it's a float. So I was always convinced it was a number in the range [0,1] (a percentage) but then I see you use 5.0.

Is that then the absolute frequency of a feature? (e.g. the number of times a feature appears in the training data)

Is it a requirement for the entire training set or per document?

Thanks!

@jwijffels
Copy link
Contributor

It's the number of times a feature (e.g. a word / a bigram / a pos tag / a suffix / ...) should occur before it will be included in the model.
I think it's a float because all these hyperparameters in CRFsuite are stored as floats. While it would be more intuitive to have it as an integer.
Look at crfsuite::crf_options(method = "lbfgs")$params to get a list of the hyperparameters of the model which you need to tune in order to get good results.
I was working yesterday on showing how to tune a model using caret functionalities. You can find example code here.
It's my understanding that this is at the level of the entire training set. If you want to be 100% sure on this, inspect the model itself as in stats <- summary(model, "modeldetails.txt") and look at the modeldetails.txt file. Hopefully the feature is not appearing if it occurs less than the number you set in feature.minfreq

@marctorsoc
Copy link
Author

Thank you very much, awesome answer!

@marctorsoc
Copy link
Author

then I understand that value 1 and 0 for min_freq is the same right?

@jwijffels
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants