This examples demonstrates how to recommend users instead of items.
Instead of using user-to-item events to find similar items, user-to-user events are used to find similar users you may also follow, like, etc (depending on which events are used in training and how the events are used). By default, "follow" events are used.
You can find the complete modified source code here.
Modification
Engine.scala
In Query, change items
to users
and remove categories. Change ItemScore
case class to SimilarUserScore. In PredictedResult, change Array[ItemScore]
to Array[SimilarUserScore]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | case class Query( users: List[String], num: Int, whiteList: Option[Set[String]], blackList: Option[Set[String]] ) case class PredictedResult( similarUserScores: Array[SimilarUserScore] ){ override def toString: String = similarUserScores.mkString(",") } case class SimilarUserScore( user: String, score: Double ) |
DataSource.scala
In DataSource, change ViewEvent
case class to FollowEvent. Remove Item
case class.
Change
1 | case class ViewEvent(user: String, item: String, t: Long) |
to
1 2 | // MODIFIED case class FollowEvent(user: String, followedUser: String, t: Long) |
Modify TrainingData class to use followEvent
1 2 3 4 5 6 7 8 9 10 | class TrainingData( val users: RDD[(String, User)], val followEvents: RDD[FollowEvent] // MODIFIED ) extends Serializable { override def toString = { s"users: [${users.count()} (${users.take(2).toList}...)]" + // MODIFIED s"followEvents: [${followEvents.count()}] (${followEvents.take(2).toList}...)" } } |
Modify readTraining()
function of DataSource
to read "follow" events (commented with "// MODIFIED"). Remove the RDD of (entityID, Item):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | override def readTraining(sc: SparkContext): TrainingData = { // create a RDD of (entityID, User) val usersRDD: RDD[(String, User)] = ... // MODIFIED // get all "user" "follow" "followedUser" events val followEventsRDD: RDD[FollowEvent] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("follow")), // targetEntityType is optional field of an event. targetEntityType = Some(Some("user")))(sc) // eventsDb.find() returns RDD[Event] .map { event => val followEvent = try { event.event match { case "follow" => FollowEvent( user = event.entityId, followedUser = event.targetEntityId.get, t = event.eventTime.getMillis) case _ => throw new Exception(s"Unexpected event $event is read.") } } catch { case e: Exception => { logger.error(s"Cannot convert $event to FollowEvent." + s" Exception: $e.") throw e } } followEvent }.cache() new TrainingData( users = usersRDD, followEvents = followEventsRDD // MODIFIED ) } |
Preparator.scala
Modify Preparator to pass followEvents to algorithm as PreparedData.
Modify Preparator's parpare()
method:
1 2 3 4 5 6 7 8 | ... def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = { new PreparedData( users = trainingData.users, followEvents = trainingData.followEvents) // MODIFIED } |
Modify PreparedData
class:
1 2 3 4 5 | class PreparedData( val users: RDD[(String, User)], val followEvents: RDD[FollowEvent] // MODIFIED ) extends Serializable |
ALSAlgorithm.scala
Modify ALSModel class to use similar user. Modify train()
method to train with follow event. Modify predict()
method to predict similar users.
Test the Result
Then we can build/train/deploy the engine and test the result:
The query
1 2 3 | $ curl -H "Content-Type: application/json" \ -d '{ "users": ["u1"], "num": 4 }' \ http://localhost:8000/queries.json |
will return the result
1 2 3 4 5 6 7 8 | { "similarUserScores":[ {"user":"u3","score":0.7574200014043541}, {"user":"u10","score":0.6484507108863744}, {"user":"u43","score":0.64741489488357}, {"user":"u29","score":0.5767264820728124} ] } |
That's it! Now your engine can recommend users.