There are two types of user preferences:
- explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
- implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.
MLlib ALS provides the setImplicitPrefs()
function to set whether to use implicit preference. The ALS algorithm takes RDD[Rating] as training data input. The Rating class is defined in Spark MLlib library as:
1 | case class Rating(user: Int, product: Int, rating: Double) |
By default, the recommendation template sets setImplicitPrefs()
to false
which expects explicit rating values which the user has rated the item.
To handle implicit preference, you can set setImplicitPrefs()
to true
. In this case, the "rating" value input to ALS is used to calculate the confidence level that the user likes the item. Higher "rating" means a stronger indication that the user likes the item.
The following provides an example of using implicit preference. You can find the complete modified source code here.
Training with view events
For example, if the more number of times the user has viewed the item, the higher confidence that the user likes the item. We can aggregate the number of views and use this as the "rating" value.
First, we can modify DataSource.scala
to aggregate the number of views of the user on the same item:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | def getRatings(sc: SparkContext): RDD[Rating] = { val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("view")), // MODIFIED // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => try { val ratingValue: Double = event.event match { case "view" => 1.0 // MODIFIED case _ => throw new Exception(s"Unexpected event ${event} is read.") } // MODIFIED // key is (user id, item id) // value is the rating value, which is 1. ((event.entityId, event.targetEntityId.get), ratingValue) } catch { case e: Exception => { logger.error(s"Cannot convert ${event} to Rating. Exception: ${e}.") throw e } } } // MODIFIED // sum all values for the same user id and item id key .reduceByKey { case (a, b) => a + b } .map { case ((uid, iid), r) => Rating(uid, iid, r) }.cache() ratingsRDD } override def readTraining(sc: SparkContext): TrainingData = { new TrainingData(getRatings(sc)) } |
Then, we can modify ALSAlgorithm.scala to set setImplicitPrefs
to true
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | class ALSAlgorithm(val ap: ALSAlgorithmParams) extends PAlgorithm[PreparedData, ALSModel, Query, PredictedResult] { ... def train(sc: SparkContext, data: PreparedData): ALSModel = { ... // If you only have one type of implicit event (Eg. "view" event only), // set implicitPrefs to true // MODIFIED val implicitPrefs = true val als = new ALS() als.setUserBlocks(-1) als.setProductBlocks(-1) als.setRank(ap.rank) als.setIterations(ap.numIterations) als.setLambda(ap.lambda) als.setImplicitPrefs(implicitPrefs) als.setAlpha(1.0) als.setSeed(seed) als.setCheckpointInterval(10) val m = als.run(mllibRatings) new ALSModel( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures, userStringIntMap = userStringIntMap, itemStringIntMap = itemStringIntMap) } ... } |
Now the recommendation engine can train a model with implicit preference events.