You can modify the default DataSource to read your custom properties or different Entity Type.
This explains how to add user defined properties to items returned by your engine. We add properties "title", "date" and "imdbUrl" for entity type "item".
You can find the complete modified source code here.
Note: you also need import events with these properties accordingly.
Modification
DataSource.scala
- modify the
Item
parameters - modify how to create the
Item
object using the entity properties
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | // MODIFIED case class Item( title: String, date: String, imdbUrl: String, categories: Option[List[String]]) ... override def readTraining(sc: SparkContext): TrainingData = { ... // create a RDD of (entityID, Item) val itemsRDD: RDD[(String, Item)] = PEventStore.aggregateProperties( appName = dsp.appName, entityType = "item" )(sc).map { case (entityId, properties) => val item = try { // Assume categories is optional property of item. // MODIFIED Item( title = properties.get[String]("title"), date = properties.get[String]("date"), imdbUrl = properties.get[String]("imdbUrl"), categories = properties.getOpt[List[String]]("categories")) } catch { case e: Exception => { logger.error(s"Failed to get properties ${properties} of" + s" item ${entityId}. Exception: ${e}.") throw e } } (entityId, item) }.cache() ... } |
Engine.scala
Modify the ItemScore
parameters too.
1 2 3 4 5 6 7 8 | // MODIFIED case class ItemScore( item: String, title: String, date: String, imdbUrl: String, score: Double ) extends Serializable |
ALSAlgorithm.scala
Modify how to create the ItemScore object using the properties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def predict(model: ALSModel, query: Query): PredictedResult = { ... val itemScores = topScores.map { case (i, s) => // MODIFIED val it = model.items(i) ItemScore( item = model.itemIntStringMap(i), title = it.title, date = it.date, imdbUrl = it.imdbUrl, score = s ) } ... } |
Using model.items(i)
you can receive corresponding object of the Item
class, and now you can access its properties which you created during previous step.
Test the Result
Then we can build/train/deploy the engine and test the result:
The query
1 2 3 | $ curl -H "Content-Type: application/json" \ -d '{ "items": ["i1"], "num": 4 }' \ http://localhost:8000/queries.json |
will return the result
1 2 3 4 5 6 7 8 | { "itemScores":[ {"item":"i3","title":"title for movie i3","date":"1947","imdbUrl":"http://imdb.com/fake-url/i3","score":0.5865418718902017}, {"item":"i44","title":"title for movie i44","date":"1941","imdbUrl":"http://imdb.com/fake-url/i44","score":0.5740199916714374}, {"item":"i37","title":"title for movie i37","date":"1940","imdbUrl":"http://imdb.com/fake-url/i37","score":0.5576820095310056}, {"item":"i6","title":"title for movie i6","date":"1947","imdbUrl":"http://imdb.com/fake-url/i6","score":0.45856345689769473} ] } |
That's it! Your engine can return more information.