At an abstract level, a classifier is a module that is able to classify a bunch of given text into a series of topics with decreasing level of confidence. Once the bot receives the list of classified topics, it tries to map the topic that has the highest level of confidence with a skill.
From the perspective of the bot, a classifier is an asynchronous black box. Hence, any implementation of the classifier abstract interface can be used with the bot.
The bot core depends on the talkify-natural-classifier library. It provides two different classifier implementations:
- Logistic Regression (
LogisticRegressionClassifier
) - Naive Bayes (
NaiveBayesClassifier
)
The further implementation of these two classifiers is based on the NaturalNode/Natural NLP Node.js
A custom classifier needs to extend the talkify-classifier interface. This interface defines three methods:
- trainDocument
- getClassifications
- initialize
Easiest way would be to define an object that implements those three methods and then passing it to the classifier
configuration option when instantiating the bot core. Here's a quick snippet:
var myClassifier = {
trainDocument: function(trainingData, callback) {
...
},
getClassifications: function(input, callback) {
...
},
initialize: function(callback) {
...
}
};
var options = {
classifier: myClassifier
};
var Bot = require('talkify').Bot;
var myBot = new Bot(options);
Lets go through each of the methods that a classifier needs to support in order to function with the Bot.
Most classifiers, in one way or another, need to be trained. If the classifier is a micro-classifier (like the one that comes built in with the Bot) then it will need to be trained every time the Bot is initialized. This is because the training data stays within the Bot's memory. This is not too much of a concern because the built-in classifier is usually very quick to train.
Some complex classifiers like the IBM Watson Classifier need not be trained every time the Bot is initialized as the training data resides on IBM's cloud servers. In this case, while you still should implement the trainDocument method, it's implementation could be kept empty.
The implementation of this method must accept two parameters. These are trainingData
and callback
. The input provided within the trainingData
object could be a single TrainingDocument
object or an array of objects of TrainingDocument
type. The implementation of this method in NaturalClassifier function of talkify-natural-classifier might help.
The TrainingDocument
object that is provided as a single or an array within trainingData
has two publicly accessible attributes. These are topic
and text
. So when you are within your trainDocument
function, you should be able to do this:
var myClassifier = {
trainDocument: function(trainingData, callback) {
if(trainingData instanceof Array) {
for(var i = 0; i < trainingData.length; i++) {
var topic = trainingData[i].topic;
var text = trainingData[i].text;
console.log('TrainingData[%s], topic: %s, text: %s', i, topic, text);
}
return callback(undefined, true);
}
var topic = trainingData.topic;
var text = trainingData.text;
return callback(undefined, true);
},
...
};
Here in this example, we are handling both cases, one where trainingData could be a single object as well as another where it could be an array. In either case, the object (or objects) that you receive will have at least one object with two aforementioned attributes.
When you are done, make sure you call the callback to let the bot know that you are done. The callback
must be called with two parameters, namely, error
and result
. The error
parameter could contain an Error
object if there is an error or a literal undefined
object. On the other hand, the result
object should contain the status of the result. In most cases, this is simply true
.
Lines 9 and 14 show an example invocation of the callback in a success scenario. In case of a failure:
...
var err = new Error('Uh oh. Something went wrong.');
callback(err);
...
In many cases, you might want to buffer your training data so that you can efficiently process your entire training data set at once. This can be achieved by leveraging the initialize
method. At the end of every call to trainDocument
, the Bot will call the initialize
method.
This means that you can buffer your training when the trainDocument
method is call and when the time is right, process it in the initialize
method.
The initialize
method is there to provide your classifier with an opportunity to finish any remaining processing after training. This could be some network or a database call. If you choose to buffer your training data, the initialize method could be a good place to complete the remainder of your processing.
The initialize method must accept one parameter, namely callback
. Whatever you do, make sure you call the callback method at the end when you are done processing.
When calling this callback, you only need to pass a parameter value in when there is an error. Here's a quick example of a success scenario:
var myClassifier = {
initialize: function(callback) {
...
return callback();
},
...
};
If something goes wrong, you can simply call:
...
var err = new Error('Whoops, something went wrong.');
callback(err);
The getClassifications
method is one of the most simple looking methods, however, at the same time, it is also one of the most important. This is because using this is how the Bot will receive classifications for text which in turn it will use to execute skills in order to respond to queries.
An implementation of this method must accept two parameters, namely text
and callback
.
The text parameter will always be a string
as it is the input text being received from the end-user that is being requested for classification.
The callback
parameter will be a function that accepts two arguments, namely error
and classifications[]
array. Call this function when you have successfully managed to classify the string.
When calling the callback in an error scenario, make sure that the first parameter is not null. Ideally you'd want this to be an instance of an Error
object like so:
...
var err = new Error('Not very good.');
callback(err);
However, you should set this parameter to undefined
in case of a success with a non-empty array as a parameter to the classifications[]
array. Here's an example snippet:
var myClassifier = {
getClassifications: function(trainingData, callback) {
...
return callback(undefined, [ {label: 'MyTopic', value: 0.5} ]);
},
...
};
The classifications[]
array must contain objects with at least two attributes, namely label
and value
. The value for the label
attribute must be a string
while that of the value
attribute must be a number. As shown in the above example, the array has one classification object of topic
MyTopic and value
0.5. This means that the classifier has classified the given text
to be of topic
MyTopic with a confidence value of 0.5, i.e. it is 50% confident on the result. Note that the first parameter is explicitly set to undefined
as the classification was successful.
As of now, the array must have objects that have the value of the value
field in decreasing order, i.e. highest first. The value of this field must be between 0 and 1.
In cases where a classification could not be determined, the classifier must return an array with an object whose topic is undefined
and value is set to an arbitrary value (usually 1 as it is most confident that the answer is indeterminate). Here's an example:
var myClassifier = {
getClassifications: function(trainingData, callback) {
...
return callback(undefined, [ {label: undefined, value: 1} ]);
},
...
};
A custom classifier instance can be passed as part of the configuration object to the Bot during its instantiation. Here's a quick example:
var myClassifier = {...}
var options = {
classifier: myClassifier
};
var Bot = require('talkify').Bot;
var myBot = new Bot(options);
Notice how the options object contains an attribute called classifier
with the value set to the myClassifier
variable.
A single bot instance can accept multiple classifiers. This way you can have multiple classifiers, each specialised in a single topic. This generally results in more accurate classification over a broad range of topics.
In order to add multiple classifiers, you must assign an array containing all your classifier objects to the classifier
options attribute.
var myClassifier1 = {...}
var myClassifier2 = {...}
var options = {
classifier: [myClassifier1, myClassifier2]
};
var Bot = require('talkify').Bot;
var myBot = new Bot(options);
This will configure the myBot object to use two classifiers: myClassifier1
and myClassifier2
.
For readability, you can optionally use classifiers
options attribute instead of classifier
attribute when specifying multiple classifiers.