Data Types
There are many data types supported by Spark, MLeap, Scikit-learn and Tensorflow. Fortunately, because all of these technologies are based on well-known mathematical data structures, they are all cross-compatible with each other to a large extent.
Data frames store the data types of their columns in a schema object. This schema can be consulted to determine which operations are available for which columns and how transformations should be handled.
Supported Data Types
| Data Type | Notes |
|---|---|
| Byte | 8-bit integer values supported by all platforms, MLeap and Spark only support signed versions |
| Short | 16-bit integer values supported by all platforms, MLeap and Spark only support signed versions |
| Integer | 32-bit integer values supported by all platforms, MLeap and Spark only support signed versions |
| Long | 64-bit integer values are supported by all platforms, MLeap and Spark only support signed versions |
| Float | 32-bit floating point values are supported by all platforms |
| Double | 64-bit floating point values are supported by all platforms |
| Boolean | 8-bit value representing true or false, can be packed into 1-bit if needed |
| String | A series of characters, either null-terminated or length prefixed depending on platform |
| Array | Sequence of elements of any of the above basic types |
| Tensor | Supported by MLeap and Tensorflow, provides n-dimensional storage for one of the above basic data types |