Getting Started with Spark
MLeap Spark integration provides serialization of Spark-trained ML pipelines to MLeap Bundles. MLeap also provides several extensions to Spark, including enhanced one hot encoding, one vs rest models and unary/binary math transformations.
Adding MLeap Spark to Your Project
MLeap Spark and its snapshots are hosted on Maven Central and so should be easily accessible via a maven build file or SBT. MLeap is currently compiled for Scala version 2.12. We try to maintain Scala compatibility with Spark.
Using SBT
libraryDependencies += "ml.combust.mleap" %% "mleap-spark" % "0.21.0"
To use MLeap extensions to Spark:
libraryDependencies += "ml.combust.mleap" %% "mleap-spark-extension" % "0.21.0"
Using Maven
<dependency>
<groupId>ml.combust.mleap</groupId>
<artifactId>mleap-spark_2.12</artifactId>
<version>0.21.0</version>
</dependency>
To use MLeap extensions to Spark:
<dependency>
<groupId>ml.combust.mleap</groupId>
<artifactId>mleap-spark-extension_2.12</artifactId>
<version>0.21.0</version>
</dependency>
- See build instructions to build MLeap from source.
- See core concepts for an overview of ML pipelines.
- See Spark documentation to learn how to train ML pipelines in Spark.
- See Demo notebooks on how to use MLeap with PySpark to serialize your pipelines to Bundle.ML and score with MLeap.