Returns the global average of the score returned by the calculate method.
Returns the global average of the score returned by the calculate method.
Evaluation information
Query
Predicted result
Actual result
If your query class cannot be automatically serialized/deserialized to/from
JSON, implement a trait by extending this trait, and overriding the
querySerializer
member with your
custom JSON4S serializer.
If your query class cannot be automatically serialized/deserialized to/from
JSON, implement a trait by extending this trait, and overriding the
querySerializer
member with your
custom JSON4S serializer.
Algorithm and serving classes using your query class would only need to mix
in the trait to enable the custom serializer.
Defines a deployment that contains an Engine
Empty actual result.
Empty algorithm parameters.
Empty data parameters.
Empty data source parameters.
Empty evaluation info.
Empty metrics parameters.
Empty model.
A concrete implementation of Params representing empty parameters.
Empty preparator parameters.
Empty prepared data.
Empty serving parameters.
Empty training data.
This class chains up the entire data process.
This class chains up the entire data process. PredictionIO uses this information to create workflows and deployments. In Scala, you should implement an object that extends the EngineFactory trait similar to the following example.
object ItemRankEngine extends EngineFactory { def apply() = { new Engine( classOf[ItemRankDataSource], classOf[ItemRankPreparator], Map( "knn" -> classOf[KNNAlgorithm], "rand" -> classOf[RandomAlgorithm], "mahoutItemBased" -> classOf[MahoutItemBasedAlgorithm]), classOf[ItemRankServing]) } }
Training data class.
Evaluation info class.
Prepared data class.
Input query class.
Output prediction class.
Actual value class.
If you intend to let PredictionIO create workflow and deploy serving automatically, you will need to implement an object that extends this class and return an Engine.
This class serves as a logical grouping of all required engine's parameters.
Defines an engine parameters generator.
Defines an engine parameters generator.
Implementations of this trait can be supplied to "pio eval" as the second command line argument.
Defines an evaluation that contains an engine and a metric.
Defines an evaluation that contains an engine and a metric.
Implementations of this trait can be supplied to "pio eval" as the first argument.
:: Experimental :: FastEvalEngine is a subclass of Engine that exploits the immutability of controllers to optimize the evaluation process
:: Experimental :: FastEvalEngine is a subclass of Engine that exploits the immutability of controllers to optimize the evaluation process
:: Experimental :: Workflow based on FastEvalEngine
:: Experimental :: Workflow based on FastEvalEngine
A helper concrete implementation of org.apache.predictionio.core.BasePreparator that passes training data through without any special preparation.
A helper concrete implementation of org.apache.predictionio.core.BasePreparator that passes training data through without any special preparation. This can be used in place for both PPreparator and LPreparator.
Training data class.
Base class of a local algorithm.
Base class of a local algorithm.
A local algorithm runs locally within a single machine and produces a model that can fit within a single machine.
If your input query class requires custom JSON4S serialization, the most idiomatic way is to implement a trait that extends CustomQuerySerializer, and mix that into your algorithm class, instead of overriding querySerializer directly.
Prepared data class.
Trained model class.
Input query class.
Output prediction class.
A concrete implementation of LServing returning the average of all algorithms' predictions, where their classes are expected to be all Double.
Base class of a local data source.
Base class of a local data source.
A local data source runs locally within a single machine and return data that can fit within a single machine.
Training data class.
Evaluation Info class.
Input query class.
Actual value class.
A concrete implementation of LServing returning the first algorithm's prediction result directly without any modification.
DEPRECATED.
DEPRECATED. Use IdentityPreparator instead.
Training data class.
Base class of a local preparator.
Base class of a local preparator.
A local preparator runs locally within a single machine and produces prepared data that can fit within a single machine.
Training data class.
Prepared data class.
Base class of serving.
Base class of serving.
Input query class.
Output prediction class.
This trait is a convenience helper for persisting your model to the local filesystem.
This trait is a convenience helper for persisting your model to the local filesystem. This trait and LocalFileSystemPersistentModelLoader contain concrete implementation and need not be implemented.
The underlying implementation is Utils.save.
class MyModel extends LocalFileSystemPersistentModel[MyParams] { ... } object MyModel extends LocalFileSystemPersistentModelLoader[MyParams, MyModel] { ... }
Algorithm parameters class.
Implement an object that extends this trait for PredictionIO to support loading a persisted model from local filesystem during serving deployment.
Implement an object that extends this trait for PredictionIO to support loading a persisted model from local filesystem during serving deployment.
The underlying implementation is Utils.load.
Algorithm parameters class.
Model class.
Base class of a Metric.
Base class of a Metric.
Evaluation information
Query
Predicted result
Actual result
Metric result
:: DeveloperApi :: Do no use this directly.
:: DeveloperApi :: Do no use this directly. Use MetricEvaluator$ instead. This is an implementation of org.apache.predictionio.core.BaseEvaluator that evaluates prediction performance based on metric scores.
Evaluation information type
Query class
Predicted result class
Actual result class
Metric result class
Contains all results of a MetricEvaluator
Contains all results of a MetricEvaluator
Type of the primary metric score
The best score among all iterations
The set of engine parameters that yielded the best score
The index of iteration that yielded the best score
Brief description of the primary metric score
Brief descriptions of other metric scores
All sets of engine parameters and corresponding metric scores
An optional output path where scores are saved
Case class storing a primary score, and other scores
Case class storing a primary score, and other scores
Type of the primary metric score
Primary metric score
Other scores this metric might have
Returns the global average of the non-None score returned by the calculate method.
Returns the global average of the non-None score returned by the calculate method.
Evaluation information
Query
Predicted result
Actual result
Returns the global standard deviation of the non-None score returned by the calculate method
Returns the global standard deviation of the non-None score returned by the calculate method
This method uses org.apache.spark.util.StatCounter library, a one pass method is used for calculation
Evaluation information
Query
Predicted result
Actual result
Base class of a parallel-to-local algorithm.
Base class of a parallel-to-local algorithm.
A parallel-to-local algorithm can be run in parallel on a cluster and produces a model that can fit within a single machine.
If your input query class requires custom JSON4S serialization, the most idiomatic way is to implement a trait that extends CustomQuerySerializer, and mix that into your algorithm class, instead of overriding querySerializer directly.
Prepared data class.
Trained model class.
Input query class.
Output prediction class.
Base class of a parallel algorithm.
Base class of a parallel algorithm.
A parallel algorithm can be run in parallel on a cluster and produces a model that can also be distributed across a cluster.
If your input query class requires custom JSON4S serialization, the most idiomatic way is to implement a trait that extends CustomQuerySerializer, and mix that into your algorithm class, instead of overriding querySerializer directly.
To provide evaluation feature, one must override and implement the
batchPredict method. Otherwise, an exception will be thrown when pio eval
is used.
Prepared data class.
Trained model class.
Input query class.
Output prediction class.
Base class of a parallel data source.
Base class of a parallel data source.
A parallel data source runs locally within a single machine, or in parallel on a cluster, to return data that is distributed across a cluster.
Training data class.
Evaluation Info class.
Input query class.
Actual value class.
DEPRECATED.
DEPRECATED. Use IdentityPreparator instead.
Training data class.
Base class of a parallel preparator.
Base class of a parallel preparator.
A parallel preparator can be run in parallel on a cluster and produces a prepared data that is distributed across a cluster.
Training data class.
Prepared data class.
Base trait for all kinds of parameters that will be passed to constructors of different controller classes.
Mix in and implement this trait if your model cannot be persisted by PredictionIO automatically.
Mix in and implement this trait if your model cannot be persisted by PredictionIO automatically. A companion object extending IPersistentModelLoader is required for PredictionIO to load the persisted model automatically during deployment.
Notice that models generated by PAlgorithm cannot be persisted automatically by nature and must implement these traits if model persistence is desired.
class MyModel extends PersistentModel[MyParams] { def save(id: String, params: MyParams, sc: SparkContext): Boolean = { ... } } object MyModel extends PersistentModelLoader[MyParams, MyModel] { def apply(id: String, params: MyParams, sc: Option[SparkContext]): MyModel = { ... } }
In Java, all you need to do is to implement this interface, and add a static method with 3 arguments of type String, Params, and SparkContext.
public class MyModel implements PersistentModel<MyParams>, Serializable { ... public boolean save(String id, MyParams params, SparkContext sc) { ... } public static MyModel load(String id, Params params, SparkContext sc) { ... } ... }
Algorithm parameters class.
Implement an object that extends this trait for PredictionIO to support loading a persisted model during serving deployment.
Implement an object that extends this trait for PredictionIO to support loading a persisted model during serving deployment.
Algorithm parameters class.
Model class.
Trait for metric which returns a score based on Query, PredictedResult, and ActualResult
Trait for metric which returns a score based on Query, PredictedResult, and ActualResult
Query class
Predicted result class
Actual result class
Metric result class
Extends a data class with this trait if you want PredictionIO to automatically perform sanity check on your data classes during training.
Extends a data class with this trait if you want PredictionIO to automatically perform sanity check on your data classes during training. This is very useful when you need to debug your engine.
Base class of several helper types that represent emptiness
SimpleEngine has only one algorithm, and uses default preparator and serving layer.
SimpleEngine has only one algorithm, and uses default preparator and serving
layer. Current default preparator is IdentityPreparator
and serving is
FirstServing
.
Training data class.
Evaluation info class.
Input query class.
Output prediction class.
Actual value class.
This shorthand class serves the SimpleEngine
class.
Returns the global standard deviation of the score returned by the calculate method
Returns the global standard deviation of the score returned by the calculate method
This method uses org.apache.spark.util.StatCounter library, a one pass method is used for calculation
Evaluation information
Query
Predicted result
Actual result
Returns the sum of the score returned by the calculate method.
Returns the sum of the score returned by the calculate method.
Evaluation information
Query
Predicted result
Actual result
Result, output of the function calculate, must be Numeric
Returns zero.
Returns zero. Useful as a placeholder during evaluation development when not all components are implemented.
Evaluation information
Query
Predicted result
Actual result
DEPRECATED.
DEPRECATED. Use EngineFactory instead.
(Since version 0.9.2) Use EngineFactory instead.
DEPRECATED.
DEPRECATED. Use LocalFileSystemPersistentModel instead.
(Since version 0.9.2) Use LocalFileSystemPersistentModel instead.
DEPRECATED.
DEPRECATED. Use LocalFileSystemPersistentModelLoader instead.
(Since version 0.9.2) Use LocalFileSystemPersistentModelLoader instead.
DEPRECATED.
DEPRECATED. Use PersistentModel instead.
(Since version 0.9.2) Use PersistentModel instead.
DEPRECATED.
DEPRECATED. Use PersistentModelLoader instead.
(Since version 0.9.2) Use PersistentModelLoader instead.
Mix in this trait for queries that contain prId (PredictedResultId).
Mix in this trait for queries that contain prId (PredictedResultId). This is useful when your engine expects queries to also be associated with prId keys when feedback loop is enabled.
(Since version 0.9.2) To be removed in future releases.
DEPRECATED.
DEPRECATED. Use CustomQuerySerializer instead.
(Since version 0.9.2) Use CustomQuerySerializer instead.
This object contains concrete implementation for some methods of the Engine class.
Companion object for creating EngineParams instances.
:: Experimental :: Workflow based on FastEvalEngine
:: Experimental :: Workflow based on FastEvalEngine
Companion object of IdentityPreparator that conveniently returns an instance of the class of IdentityPreparator for use with EngineFactory.
A concrete implementation of LServing returning the average of all algorithms' predictions, where their classes are expected to be all Double.
A concrete implementation of LServing returning the first algorithm's prediction result directly without any modification.
DEPRECATED.
DEPRECATED. Use IdentityPreparator instead.
Companion object of MetricEvaluator
DEPRECATED.
DEPRECATED. Use IdentityPreparator instead.
Controller utilities.
Companion object of ZeroMetric
Provides building blocks for writing a complete prediction engine consisting of DataSource, Preparator, Algorithm, Serving, and Evaluation.
Start Building an Engine
The starting point of a prediction engine is the Engine class.
The DASE Paradigm
The building blocks together form the DASE paradigm. Learn more about DASE here.
Types of Building Blocks
Depending on the problem you are solving, you would need to pick appropriate flavors of building blocks.
Engines
There are 3 typical engine configurations:
In both configurations 1 and 2, data is sourced and prepared in a parallelized fashion, with data type as RDD.
The difference between configurations 1 and 2 come at the algorithm stage. In configuration 1, the algorithm operates on potentially large data as RDDs in the Spark cluster, and eventually outputs a model that is small enough to fit in a single machine.
On the other hand, configuration 2 outputs a model that is potentially too large to fit in a single machine, and must reside in the Spark cluster as RDD(s).
With configuration 1 (P2LAlgorithm), PredictionIO will automatically try to persist the model to local disk or HDFS if the model is serializable.
With configuration 2 (PAlgorithm), PredictionIO will not automatically try to persist the model, unless the model implements the PersistentModel trait.
In special circumstances where both the data and the model are small, configuration 3 may be used. Beware that RDDs cannot be used with configuration 3.
Data Source
PDataSource is probably the most used data source base class with the ability to process RDD-based data. LDataSource cannot handle RDD-based data. Use only when you have a special requirement.
Preparator
With PDataSource, you must pick PPreparator. The same applies to LDataSource and LPreparator.
Algorithm
The workhorse of the engine comes in 3 different flavors.
P2LAlgorithm
Produces a model that is small enough to fit in a single machine from PDataSource and PPreparator. The model cannot contain any RDD. If the produced model is serializable, PredictionIO will try to automatically persist it. In addition, P2LAlgorithm.batchPredict is already implemented for Evaluation purpose.
PAlgorithm
Produces a model that could contain RDDs from PDataSource and PPreparator. PredictionIO will not try to persist it automatically unless the model implements PersistentModel. PAlgorithm.batchPredict must be implemented for Evaluation.
LAlgorithm
Produces a model that is small enough to fit in a single machine from LDataSource and LPreparator. The model cannot contain any RDD. If the produced model is serializable, PredictionIO will try to automatically persist it. In addition, LAlgorithm.batchPredict is already implemented for Evaluation purpose.
Serving
The serving component comes with only 1 flavor--LServing. At the serving stage, it is assumed that the result being served is already at a human- consumable size.
Model Persistence
PredictionIO tries its best to persist trained models automatically. Please refer to LAlgorithm.makePersistentModel, P2LAlgorithm.makePersistentModel, and PAlgorithm.makePersistentModel for descriptions on different strategies.