Serializing with MLeap
Serializing and deserializing with MLeap is a simple task. You can choose to serialize to a directory on the file system or to a zip file that can easily be shipped around.
Create a Simple MLeap Pipeline
import ml.combust.bundle.BundleFile
import ml.combust.bundle.serializer.SerializationFormat
import ml.combust.mleap.core.feature.{OneHotEncoderModel, StringIndexerModel}
import ml.combust.mleap.core.regression.LinearRegressionModel
import ml.combust.mleap.runtime.transformer.Pipeline
import ml.combust.mleap.runtime.transformer.feature.{OneHotEncoder, StringIndexer, VectorAssembler}
import ml.combust.mleap.runtime.transformer.regression.LinearRegression
import org.apache.spark.ml.linalg.Vectors
import ml.combust.mleap.runtime.MleapSupport._
import resource._
// Create a sample pipeline that we will serialize
// And then deserialize using various formats
val stringIndexer = StringIndexer(
shape = NodeShape.scalar(inputCol = "a_string", outputCol = "a_string_index"),
model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))
val oneHotEncoder = OneHotEncoder(
shape = NodeShape.vector(1, 2, inputCol = "a_string_index", outputCol = "a_string_oh"),
model = OneHotEncoderModel(2, dropLast = false))
val featureAssembler = VectorAssembler(
shape = NodeShape().withInput("input0", "a_string_oh").
withInput("input1", "a_double").withStandardOutput("features"),
model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))
val linearRegression = LinearRegression(
shape = NodeShape.regression(3),
model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))
val pipeline = Pipeline(
shape = NodeShape(),
model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))
Serialize to Zip File
In order to serialize to a zip file, make sure the URI begins with
jar:file
and ends with a .zip
.
For example
jar:file:/tmp/mleap-bundle.zip
.
JSON Format
for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {
pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
}
Protobuf Format
for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-protobuf.zip"))) {
pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
}
Serialize to Directory
In order to serialize to a directory, make sure the URI begins with
file
.
For example file:/tmp/mleap-bundle-dir
JSON Format
for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) {
pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
}
Protobuf Format
for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-protobuf-dir"))) {
pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
}
Deserializing
Deserializing is just as easy as serializing. You don't need to know the format the MLeap Bundle was serialized as beforehand, you just need to know where the bundle is.
Zip Bundle
// Deserialize a zip bundle
// Use Scala ARM to make sure resources are managed properly
val zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {
bundle.loadMleapBundle().get
}).opt.get
Directory Bundle
// Deserialize a directory bundle
// Use Scala ARM to make sure resources are managed properly
val dirBundle = (for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) yield {
bundle.loadMleapBundle().get
}).opt.get