Provides helper functions for configuring, storing, and processing information related to Natural Language processing of a statment and the retrieval of variable data from that statement.
These functions can be incorporated into a server to process statements to determine a class to handle the statement and the parameters to pass to the function used to handle statements for that class.
In your npm project:
Either run:
npm install hubot-ibmcloud-cognitive-lib --save
or add the following line to your package.json's dependencies:
"hubot-ibmcloud-cognitive-lib": "*"
In general, Watson's Natural Language Classifier maps a statement to a class that best matches it. The classifier is seeded with classes and various statements that can be associated with each class.
For the NLC support, a class will represent a command (such as weather
). That is, there is a one-to-one correlation between command handlers and the NLC class.
Once the class has been determined, the parameter values needed by that class are pulled from the statement. For instance if the statement is I want the weather for Chicago
then the location (Chicago) is pulled from the statement.
This library uses PouchDB to store learning data for the Watson NLC. PouchDB is an offline first database and will periodically synchronize with a remote Cloudant database (if configured as an envionment variable). Writing to a local database maintains a smooth interaction with the consumer of this package as there is no need for am immediate network hop.
HUBOT_WATSON_NLC_URL
HUBOT_WATSON_NLC_USERNAME
HUBOT_WATSON_NLC_PASSWORD
HUBOT_WATSON_NLC_CLASSIFIER (optional, defaults to default-hubot-classifier)
HUBOT_WATSON_NLC_AUTO_APPROVE (optional, defaults to false)
HUBOT_CLOUDANT_ENDPOINT (optional, default to null)
HUBOT_CLOUDANT_KEY (optional, default to null)
HUBOT_CLOUDANT_PASSWORD (optional, default to null)
HUBOT_CLOUDANT_DB (defaults to nlc)
HUBOT_DB_DIRECTORY (defaults to 'databases')
SYNC_INTERVAL (defaults to 30 minutes, this value is set in millseconds)
CONFIDENCE_THRESHOLD_HIGH (defaults to 0.8)
CONFIDENCE_THRESHOLD_LOW (defaults to 0.05)
HUBOT_DB_TEST (defaults to false, but should be set to null if running in same shell as the tests which overrides this setting)
Note: Usage of HUBOT_WATSON_NLC_AUTO_APPROVE
could have potential negative effects. Auto-approving the classified statements if they include keywords/entities could cause incorrect classifications for other command usages in the future.
NLC defaults to using ./databases
as the folder for the configuration data and this folder is created if it doesn't exist.
To create a class with a set of suggested natural language statements, an emit target, and parameter value definitions create a JSON file such as the following sample:
{
"name": "AName",
"version": "version",
"classes": [
{
"class": "some-class",
"description": "Optional description for this class",
"emittarget": "some-class-targetid",
"texts": [
"How can you help me?",
"What can you do for me?",
"In what topics are you trained?",
"help"
],
"parameters": [
{
"name": "parameter-name",
"title": "readable parameter name",
"type": "type-of-parameter",
"values": [ "value1", "value2" ],
"prompt": "OK. What is the parameter value you want me to use?"
}
]
}
],
"parameter.values": [
{
"name": "parameter-name",
"values": [ "value1", "value2" ]
}
]
}
name
is required and uniquely identifies this JSON file from other packages.
version
is the version of this file.
class
is the name of the class you want the classifier to use. Class identifiers should be unique within the file, if the class is not unique then it will be rejected by the database.
description
is optional. It provides a description for the class. If a value is not provided it will default to the class name.
texts
is an array of suggested natural language statements to use to invoke the command. These statements are used to seed the natural language classifier.
emittarget
is optional. It provides a target event name, when not specified it defaults to the class
value. If this package is incorporated within a bot, this emittarget could be used to communicate between a common NLC handler and a specific command handler.
parameters
is optional and there is one for each desired parameter.
Within the parameters
object name
is a required field and gives the name for this parameter.
Within the parameters
object title
is an optional field and gives the displayable name for this parameter.
Within the parameters
object type
is a required field and gives a predefined type, this should be one of
(entity
, keyword
, number
, repourl
, repouser
, reponame
, city
) but this restriction is not enforced.
Within the parameters
object values
is an optional list of values to associate with the parameter name (applies to entity
and keyword
types).
Within the parameters
object prompt
is an optional field. If a value for the parameter can not be determined through normal processing, this prompt could be used to ask the user for the specific parameter value.
Within the parameters
object entityfunction
is an optional field (applies to entity
types). It specifies the name of a registered entity function that is invoked to obtain the latest, most complete set of entity values for the parameter. This allows a fuzzy match on entity values.
The entity function is registered at runtime. The name it is registered with is namespaced. Assume n1
is the root name
configured in the json file. If the entityfunction
field name is f1
, then the entity function is registered with the name n1_f1
. To register the entity function at runtime, add a statement similar to the following:
nlcconfig.setGlobalEntityFunction('n1_f1', function(robot, res, parameterName, parameters) {
return new Promise(function(resolve, reject) { // Always return a Promise
... obtain parameter value for parameterName
resolve(parameterValue);
});
});
Within the parameters
object required
is an optional field indicating whether the value for the parameter is required. The default is true.
parameter.values
is optional and is used to specify global hard-coded values for a specified parameter name. All commands using that parameter name will use the common set of hard-coded values.
Within the parameter.values
object name
is a required field and gives the name for this parameter.
Within the parameters.values
object values
is a required list of values to associate with the parameter name (applies to entity
and keyword
types).
It is also possible to set global parameter values at runtime. If this is going to be done, then a parameter.values
object should be configured in the json file with an empty list.
parameter.values
is namespaced with the name
(denoted n1
) at the root of the JSON structure used with the name
(denoted n2
) in the parameter.values
object as follows n1_n2
. e.g. to update a parameter value. To set the global parameter at runtime, add a statement similar to the following:
nlcconfig.updateGlobalParameterValues('n1_n2', ['aVal', 'anotherVal']);
Here's an explanation of the various parameter types:
entity: If there are hard-coded values associated with the parameter name, then the statement is searched for each of the hard-coded values. If one is found, that is the parameter value. If there are no hard-coded values or none were found in the statement, then the 'poc' package is used to parse the statement. It is then analyzed for a single NN or NNP value. If one is found, the associated text is the parameter value. If none are found or multiples are found (ambiguous), then no parameter value is set.
keyword: If there are hard-coded values associated with the parameter name (should be), then the statement is searched for each of the hard-coded values. If one is found, that is the parameter value. If none are found, then no parameter value is set.
number: The 'poc' package is used to parse the statement. It is then analyzed for a single CD value. If one is found, the associated text is the parameter value. If none are found or multiples are found (ambiguous), then no parameter value is set.
repourl: A url regular expression is used to pull a url from the statement. If a match is found, that is the parameter value.
repouser: A repo regular expression (repouser/reponame) is used to pull a repo username from the statement. If a match is found, that is the parameter value. If a match is not found, then no parameter value is set.
reponame: A repo regular expression (repouser/reponame) is used to pull a repo name from the statement. If a match is found, that is the parameter value. If a match is not found, then no parameter value is set.
city: Watson Alchemy is used to parse the statement into entities. If a single City entity is found, then the associated text is the parameter value. If one is not found or multiples are found (ambiguous), then no parameter value is set.
The data included in this file will be put into a pouchdb database called by default 'nlc' and marked as private and will not be synchronized if that feature is enabled. Feedback data synchronization (if enabled) from a remote Cloudant/CouchDB server will be stored in the same database.
Run initDb /path/to/NLC.json
to create the training data. If you are a developer on this library you will first have to run npm link
.
The training data has now been created and will be used when training the classifier. The class definition for obtaining parameter values from the statement and specifying the emit target have also now been created and will be available for processing the statement through the natural language path.
Following is a description of the various functions exposed by this package:
These are functions used to read/write configuration data.
-
nlcconfig.getAllClasses()
- Retrieve seeded text information. It is an array of arrays of
statement
andclass-name
.
- Retrieve seeded text information. It is an array of arrays of
-
nlcconfig.getClassEmitTarget(className)
- Retrieve all configuration information for a given class. It is an object that parallels the json definition of the class (without the texts).
- Note: If global parameters have been set or a parameter name, then they will be included in the list of values for any parameter with that name.
-
nlcconfig.updateGlobalParameterValues(name, values)
- Update global parameter values for a parameters for the given name. Note that the name is namespaced. It should be the value of the root 'name' field in the json file contatenated with '_' and the parameter name. After this is invoked, all parameters with the same name will contain the specified values.
-
nlcconfig.setGlobalEntityFunction(name, function)
- Register entity function for the given name. Note that the name is namespaced. It should be the value of the root 'name' field in the json file contatenated with '_' and the entity function name.
These are functions used to process a statement against the current NLC classifier and determine the best class to handle it. It can also be used to trigger a training to create a new NLC classifier.
-
NLCManager.train()
- Create a new classifier and start training it with the latest information stored in the database.
-
NLCManager.monitorTraining(classifier_id)
- Monitor the training of the given classifier. A promise is returned and is resolved when the training is complete.
-
NLCManager.classifierStatus(classifier_id)
- Retrieve the current status of the classifier (Training or Available)
-
NLCManager.classify(text)
- Returns the Watson NLC classification information for the given statement. This includes information such as the top className and an array of potential className matches along with a confidence level and score.
See LICENSE.txt for license information.
Please check out our Contribution Guidelines for detailed information on how you can lend a hand.