Getting Started with Scikit-Learn
MLeap Scikit-Learn integration provides serialization of Scikit-trained ML pipelines to MLeap Bundles. MLeap also provides several extensions to Scikit, including MLeap extensions transformers.
MLeap Scikit integration works by adding Bundle.ML serialization to Transformers, Pipelines and Feature Unions. It is important to note that because the core execution engine is in scala and is modeled after Spark transformers, that only supported transformers are those that are available in Spark and any libraries that extend Spark. For a full list of supported scikit-transformers see the supported transformers page or if you'd like support for custom transformers see the custom transformers section.
Adding MLeap Scikit to Your Project
To add MLeap to your Scikit project, just pip install MLeap
pip install mleap
Then in your python environment import MLeap extensions to any Scikit transformers you plan to serialize:
# Extends Bundle.ML Serialization for Pipelines
import mleap.sklearn.pipeline
# Extends Bundle.ML Serialization for Feature Unions
import mleap.sklearn.feature_union
# Extends Bundle.ML Serialization for Base Transformers (i.e. LabelEncoder, Standard Scaler)
import mleap.sklearn.preprocessing.data
# Extends Bundle.ML Serialization for Linear Regression Models
import mleap.sklearn.base
# Extends Bundle.ML Serialization for Logistic Regression
import mleap.sklearn.logistic
# Extends Bundle.ML Serialization for Random Forest Regressor
from mleap.sklearn.ensemble import forest
For more information on how to use MLeap extensions to Scikit:
- See core concepts for an overview of ML pipelines.
- Detailed guide to MLeap and Scikit-Learn
- See Scikit-learn documentation to learn how to train ML pipelines in Python.
- See Scikit-learn documentation on how to use Feature Unions with pipelines
- See Demo notebook on how to use Scikit and MLeap to serialize your pipeline to Bundle.ml
- Learn how to transform a DataFrame using MLeap.