A PredictionIO engine is instantiated by a set of parameters. These parameters define which algorithm is to be used, as well supply the parameters for the algorithm itself. This naturally raises the question of how to choose the best set of parameters. The evaluation module streamlines the process of tuning the engine to the best parameter set and deploys it.
Quick Start
We demonstrate the evaluation with the classification template. The classification template uses a naive bayesian algorithm that has a smoothing parameter. We evaluate the prediction quality against different parameter values to find the best parameter values, and then deploy it.
Edit the AppId
Edit MyClassification/src/main/scala/Evaluation.scala to specify the appId you used to import the data.
1 2 3 4 5 6 | object EngineParamsList extends EngineParamsGenerator { ... private[this] val baseEP = EngineParams( dataSourceParams = DataSourceParams(appId = <YOUR_APP_ID>, evalK = Some(5))) ... } |
Build and run the evaluation
To run an evaluation, the command pio eval
is used. It takes two mandatory parameter, 1. the Evaluation
object, which tells PredictionIO the engine and metric we use for the evaluation; and 2. the EngineParamsGenerator
, which contains a list of engine params to test against. The following command kickstarts the evaluation workflow for the classification template.
1 2 3 | $ pio build ... $ pio eval org.example.classification.AccuracyEvaluation org.example.classification.EngineParamsList |
You will see the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ... [INFO] [CoreWorkflow$] runEvaluation started ... [INFO] [MetricEvaluator] Iteration 0 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":10.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.9281045751633987,List()) [INFO] [MetricEvaluator] Iteration 1 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":100.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.9150326797385621,List()) [INFO] [MetricEvaluator] Iteration 2 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":1000.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.4444444444444444,List()) [INFO] [MetricEvaluator] Writing best variant params to disk... [INFO] [CoreWorkflow$] Updating evaluation instance with result: MetricEvaluatorResult: # engine params evaluated: 3 Optimal Engine Params: { "dataSourceParams":{ "":{ "appId":19, "evalK":5 } }, "preparatorParams":{ "":{ } }, "algorithmParamsList":[ { "naive":{ "lambda":10.0 } } ], "servingParams":{ "":{ } } } Metrics: org.example.classification.Accuracy: 0.9281045751633987 The best variant params can be found in best.json [INFO] [CoreWorkflow$] runEvaluation completed |
The console prints out the evaluation metric score of each engine params, and finally pretty print the optimal engine params. Amongst the 3 engine params we evaluate, lambda = 10.0 yields the highest accuracy score of ~0.9281.
Deploy the best engine parameter
The evaluation module also writes out the best engine parameter to disk at best.json
. We can train and deploy this specify engine variant using the extra parameter -v
. For example:
1 2 3 4 5 6 7 | $ pio train -v best.json ... [INFO] [CoreWorkflow$] Training completed successfully. $ pio deploy -v best.json ... [INFO] [HttpListener] Bound to localhost/127.0.0.1:8000 [INFO] [MasterActor] Bind successful. Ready to serve. |
At this point, we have successfully deployed the best engine variant we found through the evaluation process.
Detailed Explanation
An engine often depends on a number of parameters, for example, the naive bayesian classification algorithm has a smoothing parameter to make the model more adaptive to unseen data. Compared with parameters which are learnt by the machine learning algorithm, this smoothing parameter teaches the algorithm how to work. Therefore, such parameters are usually called hyperparameters.
In PredictionIO, we always take a holistic view of an engine. An engine is comprised of a set of DAS controllers, as well as the necessary parameters for the controllers themselves. In the evaluation, we attempt to find out the best hyperparameters for an engine, which we call engine params. Using engine params we can deploy a complete engine.
This section demonstrates how to select the optimal engine params whilst ensuring the model doesn't overfit using PredictionIO's evaluation module.
The Evaluation Design
The PredictionIO evaluation module tests for the best engine params for an engine.
Given a set of engine params, we instantiate an engine and evaluate it with existing data. The data is split into two sets, a training set and a validation set. The training set is used to train the engine, which is deployed using the same steps described in earlier sections. We query the engine with the test set data, and compare the predicted values in the response with the actual data contained in the validation set. We define a metric to compare predicted result returned from the engine with the actual result which we obtained from the test data. The goal is to maximize the metric score.
This process is repeated many times with a series of engine params. At the end, PredictionIO returns the best engine params.
We demonstrate the evaluation with the classification template.
Evaluation Data Generation
In evaluation data generation, the goal is to generate a sequence of (training, validation) data tuple. A common way is to use a k-fold generation process. The data set is split into k folds. We generate k tuples of training and validation sets, for each tuple, the training set takes k - 1 of the folds and the validation set takes the remaining fold.
To enable evaluation data generation, we need to define the actual result and implement the method for generating the (training, validation) data tuple.
Actual Result
In MyClassification/src/main/scala/Engine.scala, the ActualResult
class defines the actual result:
1 2 3 | class ActualResult( val label: Double ) extends Serializable |
This class is used to store the actual label of the data (contrast to PredictedResult
which is output of the engine).
Implement Data Generation Method in DataSource
In MyClassification/src/main/scala/DataSource.scala, the method readEval
reads and selects data from datastore and returns a sequence of (training, validation) data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | class DataSource(val dsp: DataSourceParams) extends PDataSource[TrainingData, EmptyEvaluationInfo, Query, ActualResult] { ... override def readEval(sc: SparkContext) : Seq[(TrainingData, EmptyEvaluationInfo, RDD[(Query, ActualResult)])] = { require(!dsp.evalK.isEmpty, "DataSourceParams.evalK must not be None") // The following code reads the data from data store. It is equivalent to // the readTraining method. We copy-and-paste the exact code here for // illustration purpose, a recommended approach is to factor out this logic // into a helper function and have both readTraining and readEval call the // helper. val eventsDb = Storage.getPEvents() val labeledPoints: RDD[LabeledPoint] = eventsDb.aggregateProperties( appId = dsp.appId, entityType = "user", // only keep entities with these required properties defined required = Some(List("plan", "attr0", "attr1", "attr2")))(sc) // aggregateProperties() returns RDD pair of // entity ID and its aggregated properties .map { case (entityId, properties) => try { LabeledPoint(properties.get[Double]("plan"), Vectors.dense(Array( properties.get[Double]("attr0"), properties.get[Double]("attr1"), properties.get[Double]("attr2") )) ) } catch { case e: Exception => { logger.error(s"Failed to get properties ${properties} of" + s" ${entityId}. Exception: ${e}.") throw e } } }.cache() // End of reading from data store // K-fold splitting val evalK = dsp.evalK.get val indexedPoints: RDD[(LabeledPoint, Long)] = labeledPoints.zipWithIndex (0 until evalK).map { idx => val trainingPoints = indexedPoints.filter(_._2 % evalK != idx).map(_._1) val testingPoints = indexedPoints.filter(_._2 % evalK == idx).map(_._1) ( new TrainingData(trainingPoints), new EmptyEvaluationInfo(), testingPoints.map { p => (new Query(p.features.toArray), new ActualResult(p.label)) } ) } } } |
The readEval
method returns a sequence of (TrainingData
, EvaluationInfo
, RDD[(Query, ActualResult)]
. TrainingData
is the same class we use for deploy, RDD[(Query, ActualResult)]
is the validation set, EvaluationInfo
can be used to hold some global evaluation data ; it is not used in the current example.
Lines 11 to 41 is the logic of reading and transforming data from the datastore; it is equivalent to the existing readTraining
method. After line 41, the variable labeledPoints
contains the complete dataset with which we use to generate the (training, validation) sequence.
Lines 43 to 57 is the k-fold logic. Line 45 gives each data point a unique id, and we decide whether the point belongs to the training or validation set depends on the mod of the id (lines 48 to 49). For each point in the validation set, we construct the Query
and ActualResult
(line 55) which is used validate the engine.
Evaluation Metrics
We define a Metric
which gives a score to engine params. The higher the score, the better the engine params are. In this template, we use accuracy score which measures the portion of correct prediction among all data points.
In MyClassification/src/main/scala/Evaluation.scala, the class Accuracy
implements the accuracy score. It extends a base helper class AverageMetric
which calculates the average score overall (Query, PredictionResult, ActualResult) tuple.
1 2 3 4 5 | case class Accuracy extends AverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] { def calculate(query: Query, predicted: PredictedResult, actual: ActualResult) : Double = (if (predicted.label == actual.label) 1.0 else 0.0) } |
Then, implement a Evaluation
object to define the engine and metric used in this evaluation.
1 2 3 | object AccuracyEvaluation extends Evaluation { engineMetric = (ClassificationEngine(), new Accuracy()) } |
Parameters Generation
The last component is to specify the list of engine params we want to evaluate. In this guide, we discuss the simplest method. We specify an explicit list of engine params to be evaluated.
In MyClassification/src/main/scala/Evaluation.scala, the object EngineParamsList
specifies the engine params list to be used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | object EngineParamsList extends EngineParamsGenerator { // Define list of EngineParams used in Evaluation // First, we define the base engine params. It specifies the appId from which // the data is read, and a evalK parameter is used to define the // cross-validation. private[this] val baseEP = EngineParams( dataSourceParams = DataSourceParams(appId = 18, evalK = Some(5))) // Second, we specify the engine params list by explicitly listing all // algorithm parameters. In this case, we evaluate 3 engine params, each with // a different algorithm params value. engineParamsList = Seq( baseEP.copy(algorithmParamsList = Seq(("naive", AlgorithmParams(10.0)))), baseEP.copy(algorithmParamsList = Seq(("naive", AlgorithmParams(100.0)))), baseEP.copy(algorithmParamsList = Seq(("naive", AlgorithmParams(1000.0))))) } |
A good practice is to first define a base engine params, it contains the common parameters used in all evaluations (lines 7 to 8). With the base params, we construct the list of engine params we want to evaluation by adding or replacing the controller parameter. Lines 13 to 16 generate 3 engine parameters, each has a different smoothing parameters.
Running the Evaluation
It remains to run the evaluation. Let's recap the quick start section above. The pio eval
command kick starts the evaluation, and the result can be seen from the console.
1 2 3 | $ pio build ... $ pio eval org.example.classification.AccuracyEvaluation org.example.classification.EngineParamsList |
You will see the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ... [INFO] [CoreWorkflow$] runEvaluation started ... [INFO] [MetricEvaluator] Iteration 0 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":10.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.9281045751633987,List()) [INFO] [MetricEvaluator] Iteration 1 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":100.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.9150326797385621,List()) [INFO] [MetricEvaluator] Iteration 2 [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":1000.0}}],"servingParams":{"":{}}} [INFO] [MetricEvaluator] Result: MetricScores(0.4444444444444444,List()) [INFO] [MetricEvaluator] Writing best variant params to disk... [INFO] [CoreWorkflow$] Updating evaluation instance with result: MetricEvaluatorResult: # engine params evaluated: 3 Optimal Engine Params: { "dataSourceParams":{ "":{ "appId":19, "evalK":5 } }, "preparatorParams":{ "":{ } }, "algorithmParamsList":[ { "naive":{ "lambda":10.0 } } ], "servingParams":{ "":{ } } } Metrics: org.template.classification.Accuracy: 0.9281045751633987 The best variant params can be found in best.json [INFO] [CoreWorkflow$] runEvaluation completed |
Notes
- We deliberately not mention test set in this hyperparameter tuning guide. In machine learning literature, the test set is a separate piece of data which is used to evaluate the final engine params outputted by the evaluation process. This guarantees that no information in the training / validation set is leaked into the engine params and yields a biased outcome. With PredictionIO, there are multiple ways of conducting robust tuning, we will cover this topic in the coming sections.