Let's say you want to supply a backList for each query to exclude some items from recommendation (For example, in the browsing session, the user just added some items to shopping cart, or you have a list of items you want to filter out, you may want to supply blackList in Query). This how-to will demonstrate how you can do it.
You can find the complete modified source code here.
Note that you may also use E-Commerce Recommendation Template which supports this feature by default.
If you are looking for filtering out items based on the specific user-to-item events logged by EventServer (eg. filter all items which the user has "buy" events on), you can use the E-Commerce Recommendation Template. Please refer to the algorithm parameters "unseenOnly" and "seenEvents" of the E-Commerce Recommenation Template.
Add Query Parameter
First of all we need to specify query parameter to send items ids that the user has already seen. Lets modify case class Query
in MyRecommendation/src/main/scala/Engine.scala:
1 2 3 4 5 | case class Query( user: String, num: Int, blackList: Set[String] // ADDED ) |
Filter the Data
Then we need to change the code that computes recommendation score to filter out the seen items. Lets modify class MyRecommendation/src/main/scala/ALSModel.scala. Just add the following two methods to that class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | import com.github.fommil.netlib.BLAS.{getInstance => blas} // ADDED ... // ADDED def recommendProductsWithFilter(user: Int, num: Int, productIdFilter: Set[Int]) = { val filteredProductFeatures = productFeatures .filter { case (id, _) => !productIdFilter.contains(id) } // (*) recommend(userFeatures.lookup(user).head, filteredProductFeatures, num) .map(t => Rating(user, t._1, t._2)) } // ADDED private def recommend( recommendToFeatures: Array[Double], recommendableFeatures: RDD[(Int, Array[Double])], num: Int): Array[(Int, Double)] = { val scored = recommendableFeatures.map { case (id, features) => (id, blas.ddot(features.length, recommendToFeatures, 1, features, 1)) } scored.top(num)(Ordering.by(_._2)) } ... |
Please make attention that method recommend
is the copy of method org.apache.spark.mllib.recommendation.MatrixFactorizationModel#recommend
. We can't reuse this because it’s private. Method recommendProductsWithFilter
is the almost full copy of org.apache.spark.mllib.recommendation.MatrixFactorizationModel#recommendProducts
method. The difference only is the line with commentary ‘(*)’ where we apply filtering.
Put It All Together
Next we need to invoke our new method with filtering when we query recommendations. Lets modify method predict
in MyRecommendation/src/main/scala/ALSAlgorithm.scala:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def predict(model: ALSModel, query: Query): PredictedResult = { // Convert String ID to Int index for Mllib model.userStringIntMap.get(query.user).map { userInt => // create inverse view of itemStringIntMap val itemIntStringMap = model.itemStringIntMap.inverse // recommendProductsWithFilter() returns Array[MLlibRating], which uses item Int // index. Convert it to String ID for returning PredictedResult val blackList = query.blackList.flatMap(model.itemStringIntMap.get) // ADDED val itemScores = model .recommendProductsWithFilter(userInt, query.num, blackList) // MODIFIED .map (r => ItemScore(itemIntStringMap(r.product), r.rating)) PredictedResult(itemScores) }.getOrElse{ logger.info(s"No prediction for unknown user ${query.user}.") PredictedResult(Array.empty) } } |
Test the Result
Then we can build/train/deploy the engine and test the result:
The query
1 2 3 4 | curl \ -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4 }' \ http://localhost:8000/queries.json |
will return the result
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | { "itemScores": [{ "item": "32", "score": 13.405593705856901 }, { "item": "90", "score": 10.980439687813178 }, { "item": "75", "score": 10.748973860065737 }, { "item": "1", "score": 9.769636099226231 }] } |
Lets say that the user has seen the 32
item.
1 2 3 4 | curl \ -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4, "blackList": ["32"] }' \ http://localhost:8000/queries.json |
will return the result
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | { "itemScores": [{ "item": "90", "score": 10.980439687813178 }, { "item": "75", "score": 10.748973860065737 }, { "item": "1", "score": 9.769636099226231 }, { "item": "49", "score": 8.653951817512265 }] } |
without item 32
.