The JPMML-Evaluator library is de facto the reference implementation of the PMML specification for the Java platform.
The primary objective is to provide full compliance with 4.X versions of the PMML specification (released since 2009). The secondary objective is to provide maximum working compliance with 3.X versions of the PMML specification (released between 2004 and 2007). It means that some PMML features whose scope is limited to 3.X versions (eg. removed or deprecated in 4.X versions) may not be supported.
The JPMML-Evaluator library is hardwired to perform thorough "sanity" checking. Model evaluator classes will throw an exception when an invalid and/or unsupported PMML feature is encountered.
- Pre-processing of active fields (aka independent variables) according to the DataDictionary and MiningSchema elements:
- Strict data type system:
- Except for
dateDaysSince[0]
anddateTimeSecondsSince[0]
data types.
- Except for
- Strict operational type system.
- Treatment of outlier, missing and/or invalid values.
- Strict data type system:
- Model evaluation.
- Post-processing of target fields (aka dependent variables) according to the Targets element:
- Rescaling and/or casting regression results.
- Replacing a missing regression result with the default value.
- Replacing a missing classification result with the map of prior probabilities.
- Calculation of auxiliary output fields according to the Output element:
- Over 20 different result feature types:
- Except for
confidenceIntervalLower
,confidenceIntervalUpper
,standardError
andstandardDeviation
result features.
- Except for
- Over 20 different result feature types:
- Transformations:
- Except for the
Lag
element.
- Except for the
- PMML built-in functions:
- Except for
erf
,normalCDF
,normalIDF
,normalPDF
,stdNormalCDF
,stdNormalIDF
andstdNormalPDF
functions.
- Except for
- PMML user-defined functions.
- Java user-defined functions.
Supported model types:
- Assocation rules (association).
- Cluster model (clustering):
- Except for the
distributionBased
value of themodelClass
attribute.
- Except for the
- General regression (regression, Cox regression, classification).
- k-Nearest neighbors (k-NN) (regression, classification, clustering).
- Naive Bayes (classification).
- Neural network (regression, classification).
- Regression (regression, classification).
- Rule set (classification).
- Scorecard (regression).
- Tree model (regression, classification):
- Except for
aggregateNodes
andweightedConfidence
values of themissingValueStrategy
attribute.
- Except for
- Support Vector Machine (SVM) (regression, classification):
- Except for the
Coefficients
value of thesvmRepresentation
attribute.
- Except for the
- Ensemble model (ensembles of all of the above model types) (regression, classification, clustering):
- Except for the
VariableWeight
element.
- Except for the
Not yet supported model types:
- Anomaly detection model. The purpose and benefits of wrapping an ordinary regression model into an
AnomalyDetectionModel
element are unclear. See http://mantis.dmg.org/view.php?id=165. - Baseline model
- Bayesian network
- Gaussian process
- Sequence rules
- Text model
- Time series model
- Model composition. Model composition specifies a mechanism for embedding defeatured regression and decision tree models (represented by the
Regression
andDecisionTree
elements, respectively) into other models. Model composition was deprecated in PMML schema version 4.1. - The
ClusteringModel/CenterFields
element. This element was removed in PMML schema version 3.2. PMML producers should move the list ofDerivedField
child elements to theClusteringModel/LocalTransformations
element, and reference them using a list ofClusteringField
elements instead. - The
MiningModel/Segmentation/LocalTransformations
element. This element was deprecated in PMML schema version 4.1. PMML producers should move the list ofDerivedField
child elements to theMiningModel/LocalTransformations
element instead. - The
TableLocator
element. TheTableLocator
element specifies a mechanism for incorporating data from external data sources (eg. CSV files, databases). TheTableLocator
element is simply a placeholder in PMML schema version 4.4.
The class model object can be inspected for unsupported PMML elements and attributes using a visitor class org.jpmml.evaluator.visitors.UnsupportedMarkupInspector
(source). This visitor collects all unsupported markup as instances of org.jpmml.model.UnsupportedMarkupException
.
The class model object is safe for evaluation using the JPMML-Evaluator library if the collection of exceptions is empty:
public boolean isFullySupported(PMML pmml){
UnsupportedMarkupInspector inspector = new UnsupportedMarkupInspector();
// Traverse the specified class model object
pmml.accept(inspector);
List<UnsupportedMarkupException> exceptions = inspector.getExceptions();
if(exceptions.isEmpty()){
return true;
}
return false;
}
The visitor class traverses the class model object completely. In contrast, actual model evaluator classes traverse the class model object more or less partially, whereas every "evaluation path" is a function of the specified input data record. It follows that the collection of exceptions represents the worst case scenario. The evaluation using the JPMML-Evaluator library may succed even if this collection is not empty.