Combine Features Logistic Regression ^beta

Introduction

In recommendation or online advertising, source data often come with different channels characterizing different dimensions, such as

data including age, gender, marital status ...

data including internet behavior ...

data including phone brand ...

data including APP installed ...

...

Combining features with different dimensions into one always gets better ctr(cvr), so we release the Combine Features Logistic Regression.

Quick Start

CLR.run/ is defined as below:

the elemnt of data is single instance/sample arranged as (Array[fregata.Vector], fregata.Num)

features from the same channel should be putted into same fregata.Vector by sort

the size of Array[fregata.Vector] denotes the number of source datas' channels

every channel's fregata.Vector should be putted into Array[fregata.Vector] by sort

fregata.Num denotes the instance's label

combines denotes how to combine different channels' features. For example combines=Array(Array(0,1,2), Array(2)) says that

source datas are from 3 different channels

suppose that channel#1's size is r, channel#2's is m, channel#3's is n

Array(0,1,2) says that rxmxn combined features are generated by Cartesian product, and each combined feature is generated by 3 features selected from different channels

Array(2) says that we should reserve all the features from channel#3

based on the example above, the total number of features is rxmxn+n

def run(data: RDD[(Array[fregata.Vector], fregata.Num)], combines: Array[Array[Int]], iterationNum: Int = 1): CLRModel

CLRModel.clrPredict is defined as below

parameter data's structure is the same as **CLR.run'**s
  def clrPredict(data: RDD[(Array[fregata.Vector], fregata.Num)]): 
  	RDD[((Array[fregata.Vector], fregata.Num), (fregata.Num, fregata.Num))]

Example

  /**
   * Created by takun on 16/9/19.
   */
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("logistic regression")
    val sc = new SparkContext(conf)
    // the dataset a9a can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#a9a
    val (_,trainData) = LibSvmReader.read(sc,"/Volumes/takun/data/libsvm/a9a",123)
    val (_,testData) = LibSvmReader.read(sc,"/Volumes/takun/data/libsvm/a9a.t",123)
    val model = CLR.run(trainData.map{
      case (x,label) => Array(x) -> label
    },Array(Array(0,0)),10)
    val pd = model.clrPredict(testData.map{
      case (x,label) => Array(x) -> label
    })
    val acc = Accuracy.of( pd.map{
      case ((x,l),(p,c)) =>
        c -> l
    })
    println( s"Accuracy = $acc ")
    val auc = AreaUnderRoc.of( pd.map{
      case ((x,l),(p,c)) =>
        p -> l
    })
    println( s"AreaUnderRoc = $auc ")
  }

Accuracy = 0.8462719567620686 
AreaUnderRoc = 0.900320784272655

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clr.md

clr.md

Combine Features Logistic Regression ^beta

Introduction

Quick Start

Example

Files

clr.md

Latest commit

History

clr.md

File metadata and controls

Combine Features Logistic Regression beta

Introduction

Quick Start

Example

Combine Features Logistic Regression ^beta