Package

org.apache

predictionio

Permalink

package predictionio

PredictionIO Scala API

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. predictionio
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. package controller

    Permalink

    Provides building blocks for writing a complete prediction engine consisting of DataSource, Preparator, Algorithm, Serving, and Evaluation.

    Provides building blocks for writing a complete prediction engine consisting of DataSource, Preparator, Algorithm, Serving, and Evaluation.

    Start Building an Engine

    The starting point of a prediction engine is the Engine class.

    The DASE Paradigm

    The building blocks together form the DASE paradigm. Learn more about DASE here.

    Types of Building Blocks

    Depending on the problem you are solving, you would need to pick appropriate flavors of building blocks.

    Engines

    There are 3 typical engine configurations:

    1. PDataSource, PPreparator, P2LAlgorithm, LServing 2. PDataSource, PPreparator, PAlgorithm, LServing 3. LDataSource, LPreparator, LAlgorithm, LServing

    In both configurations 1 and 2, data is sourced and prepared in a parallelized fashion, with data type as RDD.

    The difference between configurations 1 and 2 come at the algorithm stage. In configuration 1, the algorithm operates on potentially large data as RDDs in the Spark cluster, and eventually outputs a model that is small enough to fit in a single machine.

    On the other hand, configuration 2 outputs a model that is potentially too large to fit in a single machine, and must reside in the Spark cluster as RDD(s).

    With configuration 1 (P2LAlgorithm), PredictionIO will automatically try to persist the model to local disk or HDFS if the model is serializable.

    With configuration 2 (PAlgorithm), PredictionIO will not automatically try to persist the model, unless the model implements the PersistentModel trait.

    In special circumstances where both the data and the model are small, configuration 3 may be used. Beware that RDDs cannot be used with configuration 3.

    Data Source

    PDataSource is probably the most used data source base class with the ability to process RDD-based data. LDataSource cannot handle RDD-based data. Use only when you have a special requirement.

    Preparator

    With PDataSource, you must pick PPreparator. The same applies to LDataSource and LPreparator.

    Algorithm

    The workhorse of the engine comes in 3 different flavors.

    P2LAlgorithm

    Produces a model that is small enough to fit in a single machine from PDataSource and PPreparator. The model cannot contain any RDD. If the produced model is serializable, PredictionIO will try to automatically persist it. In addition, P2LAlgorithm.batchPredict is already implemented for Evaluation purpose.

    PAlgorithm

    Produces a model that could contain RDDs from PDataSource and PPreparator. PredictionIO will not try to persist it automatically unless the model implements PersistentModel. PAlgorithm.batchPredict must be implemented for Evaluation.

    LAlgorithm

    Produces a model that is small enough to fit in a single machine from LDataSource and LPreparator. The model cannot contain any RDD. If the produced model is serializable, PredictionIO will try to automatically persist it. In addition, LAlgorithm.batchPredict is already implemented for Evaluation purpose.

    Serving

    The serving component comes with only 1 flavor--LServing. At the serving stage, it is assumed that the result being served is already at a human- consumable size.

    Model Persistence

    PredictionIO tries its best to persist trained models automatically. Please refer to LAlgorithm.makePersistentModel, P2LAlgorithm.makePersistentModel, and PAlgorithm.makePersistentModel for descriptions on different strategies.

  2. package core

    Permalink

    Core base classes of PredictionIO controller components.

    Core base classes of PredictionIO controller components. Engine developers should not use these directly.

  3. package data

    Permalink

    Provides data access for PredictionIO and any engines running on top of PredictionIO

  4. package e2

    Permalink

    Independent library of code that is useful for engine development and evaluation

  5. package workflow

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped