Name		Name	Last commit message	Last commit date
parent directory ..
project		project
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
build_client.sh		build_client.sh
evaluate.sh		evaluate.sh
full_evaluation.sh		full_evaluation.sh
run_iris.sh		run_iris.sh
speed_test_eval.sh		speed_test_eval.sh

README.md

Introduction

This is the Spark ML client for ModelDB.

This library is responsible for storing machine learning operations in Spark ML, like estimator.fit(dataframe), in ModelDB.

Usage

To build the JAR, first make sure you have installed sbt (see dependencies). Then, from the spark.ml dir, run:

./build_client.sh

This will create the JAR target/scala-2.11/ml.jar.

Samples

This project includes samples here and here demonstrating usage of the library.

First, you'll need to import the classes you need, for example:

import edu.mit.csail.db.ml.modeldb.client.{ModelDbSyncer, NewProject, SyncableMetrics}

The ModelDbSyncer is the class that is responsible for syncing machine learning operations to ModelDB.

First, we'll create a ModelDbSyncer:

ModelDbSyncer.setSyncer(
    new ModelDbSyncer(projectConfig = NewProject(
        "pipeline",
        "harihar",
        "this example creates and runs a pipeline"
    ))
)

Now, when you want to log an operation to ModelDB, you append Sync to the method call. For example,

myModel.transformSync(myDataFrame)
myEstimator.fitSync(myDataFrame)
myDataFrame.randomSplitSync(Array(0.7, 0.3))

You can also take advantage of ModelDB specific operations. For example, to tag an object with a description, you can do:

myModel.tag("Some tag")
myDataFrame.tag("Some tag")
myEstimator.tag("Some tag")

You can also create annotations, which are short messages associated with Spark ML objects:

ModelDbSyncer.annotate("It seems", myDataFrame, "has a lot of missing entries")

To evaluate a model, you can do:

val metrics = SyncableMetrics.ComputeMulticlassMetrics(
    model,
    transformedDataFrame,
    labelColName,
    predictionColName
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark.ml

spark.ml

README.md

Introduction

Usage

Samples

Files

spark.ml

Directory actions

More options

Directory actions

More options

Latest commit

History

spark.ml

Folders and files

parent directory

README.md

Introduction

Usage

Samples