Implement this method to produce a prediction from a query and trained model.
Implement this method to produce a prediction from a query and trained model.
Trained model produced by train.
An input query.
A prediction.
Implement this method to produce a model from prepared data.
Implement this method to produce a model from prepared data.
Prepared data for model training.
Trained model.
To provide evaluation feature, one must override and implement this method to generate many predictions in batch.
To provide evaluation feature, one must override and implement this method
to generate many predictions in batch. Otherwise, an exception will be
thrown when pio eval
is used.
The default implementation throws an exception.
Trained model produced by train.
An RDD of index-query tuples. The index is used to keep track of predicted results with corresponding queries.
:: DeveloperApi :: Engine developers should not use this directly.
:: DeveloperApi :: Engine developers should not use this directly. This is called by evaluation workflow to perform batch prediction.
Spark context
Model
Batch of queries
Batch of predicted results
:: DeveloperApi :: Serializer for Java query classes using Gson
:: DeveloperApi :: Serializer for Java query classes using Gson
:: DeveloperApi :: Engine developers should not use this directly (read on to see how parallel algorithm models are persisted).
:: DeveloperApi :: Engine developers should not use this directly (read on to see how parallel algorithm models are persisted).
In general, parallel models may contain multiple RDDs. It is not easy to infer and persist them programmatically since these RDDs may be potentially huge. To persist these models, engine developers need to mix the PersistentModel trait into the model class and implement PersistentModel.save. If it returns true, a org.apache.predictionio.workflow.PersistentModelManifest will be returned so that during deployment, PredictionIO will use PersistentModelLoader to retrieve the model. Otherwise, Unit will be returned and the model will be re-trained on-the-fly.
Spark context
Model ID
Algorithm parameters that trained this model
Model
The model itself for automatic persistence, an instance of org.apache.predictionio.workflow.PersistentModelManifest for manual persistence, or Unit for re-training on deployment
:: DeveloperApi :: Engine developers should not use this directly.
:: DeveloperApi :: Engine developers should not use this directly. Called by serving to perform a single prediction.
Predicted result
:: DeveloperApi :: Obtains the type signature of query for this algorithm
:: DeveloperApi :: Obtains the type signature of query for this algorithm
Type signature of query
:: DeveloperApi :: Serializer for Scala query classes using org.apache.predictionio.controller.Utils.json4sDefaultFormats
:: DeveloperApi :: Serializer for Scala query classes using org.apache.predictionio.controller.Utils.json4sDefaultFormats
:: DeveloperApi :: Engine developers should not use this directly.
:: DeveloperApi :: Engine developers should not use this directly. This is called by workflow to train a model.
Spark context
Prepared data
Trained model
Base class of a parallel algorithm.
A parallel algorithm can be run in parallel on a cluster and produces a model that can also be distributed across a cluster.
If your input query class requires custom JSON4S serialization, the most idiomatic way is to implement a trait that extends CustomQuerySerializer, and mix that into your algorithm class, instead of overriding querySerializer directly.
To provide evaluation feature, one must override and implement the batchPredict method. Otherwise, an exception will be thrown when pio eval
is used.
Prepared data class.
Trained model class.
Input query class.
Output prediction class.