Shall I use cosine or dot product to calculate the similarity between user & item? #17

jackyhawk · 2018-05-30T06:44:59Z

Shall I user consine or dot product to calculate the similarity between user & item?

(since there would be some negative latent fatore for user & item, is it still suitable to cosine?

thanks very much

chtran · 2018-05-30T20:54:04Z

Hi, if you care only about similarity, not popularity, you can use cosine similarity. If you care about popularity of items, you can use dot product

jackyhawk · 2018-05-31T07:58:41Z

Thanks very much,chtran.

And what's more ,shall I use the item bias when I calculate the score?

would the result be like this:

score = user_factor dot_product item_factor + item_bias

or just:
score = user_factor dot_product item_factor

?

jackyhawk · 2018-05-31T08:25:49Z

And btw, it seemed that qmf does not support user bias?

jackyhawk · 2018-06-01T08:54:59Z

Any suggestions for this? Thanks very much

albietz · 2018-06-01T09:06:09Z

Hi @jackyhawk,

Yes, if you train your model with item biases, you should also use them when making predictions.
qmf does not have user biases because the models try to predict preferences / rankings for each user, as opposed to absolute scores (like ratings in the netflix dataset), so adding a user offset does not change anything.

-Alberto

jackyhawk · 2018-06-01T10:42:57Z

Thanks very much albietz.

and it would be like this?
score = user_factor dot_product item_factor + item_bias
or
score = cosine(user_factor , item_factor) + item_bias

And i've just paste the test metrics of my real training output( about 5 million,0.5 million items, and about 0.5 billion click,which are our users behavior within latest 30 days of (after filtering some outliers) )
it seems that the auc is very high ,but the precision and recall (@10) is very low,
is this the normal scenario?

And what about the train loss? it seems that value of [train loss] betweem 0.05~0.08 works for me.
but it is not good for [test loss = 0.223691]

====================================================
test metrics:
train loss = 0.0759575, test loss = 0.223691

18:29:02.830051 26919 MetricsEngine.cpp:41] begin metrics: epoch 9: recorded metric test_avg_auc = 0.91341,log_:1
18:29:02.830128 26919 MetricsEngine.cpp:45] epoch 9: recorded metric test_avg_auc = 0.91341
18:29:11.645699 26919 MetricsEngine.cpp:41] begin metrics: epoch 9: recorded metric test_avg_ap = 0.00194532,log_:1
18:29:11.645777 26919 MetricsEngine.cpp:45] epoch 9: recorded metric test_avg_ap = 0.00194532
18:29:14.434798 26919 MetricsEngine.cpp:41] begin metrics: epoch 9: recorded metric test_avg_p@10 = 0.0016,log_:1
18:29:14.434895 26919 MetricsEngine.cpp:45] epoch 9: recorded metric test_avg_p@10 = 0.0016
18:29:17.357926 26919 MetricsEngine.cpp:41] begin metrics: epoch 9: recorded metric test_avg_r@10 = 0.00135481,log_:1
18:29:17.358002 26919 MetricsEngine.cpp:45] epoch 9: recorded metric test_avg_r@10 = 0.00135481

jackyhawk · 2018-06-02T00:00:30Z

And what's more, as for the input data file, should the old data in the beginning or the new data in the beginning? Thanks very much

jackyhawk · 2018-06-02T08:34:32Z

And what's more, Shall I use more (longer-period user's behaviro ) data as the training dataset to improve the precison & recall @10?

and currently, i just use 100 as the nfactor dimension, shall I also increase that value?

Thanks very much

jackyhawk · 2018-06-03T06:25:48Z

Any suggestions for that? Thanks very much

jackyhawk · 2018-06-05T02:14:34Z

I tried to increase the dimension of latent factor (from 100 to 200 and then to 300), it seemed that the results got better

jackyhawk · 2018-06-08T07:26:18Z

Any suggestions for this? Thanks very much

jackyhawk changed the title ~~Shall I user consine or dot product to calculate the similarity between user & item?~~ Shall I use consine or dot product to calculate the similarity between user & item? May 31, 2018

jackyhawk changed the title ~~Shall I use consine or dot product to calculate the similarity between user & item?~~ Shall I use cosine or dot product to calculate the similarity between user & item? Jun 2, 2018

innerNULL mentioned this issue Jul 8, 2022

dev: Integrates glog/gflags with FetchContent style. #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shall I use cosine or dot product to calculate the similarity between user & item? #17

Shall I use cosine or dot product to calculate the similarity between user & item? #17

jackyhawk commented May 30, 2018

chtran commented May 30, 2018

jackyhawk commented May 31, 2018 •

edited

Loading

jackyhawk commented May 31, 2018

jackyhawk commented Jun 1, 2018

albietz commented Jun 1, 2018

jackyhawk commented Jun 1, 2018 •

edited

Loading

jackyhawk commented Jun 2, 2018

jackyhawk commented Jun 2, 2018 •

edited

Loading

jackyhawk commented Jun 3, 2018

jackyhawk commented Jun 5, 2018

jackyhawk commented Jun 8, 2018

Shall I use cosine or dot product to calculate the similarity between user & item? #17

Shall I use cosine or dot product to calculate the similarity between user & item? #17

Comments

jackyhawk commented May 30, 2018

chtran commented May 30, 2018

jackyhawk commented May 31, 2018 • edited Loading

jackyhawk commented May 31, 2018

jackyhawk commented Jun 1, 2018

albietz commented Jun 1, 2018

jackyhawk commented Jun 1, 2018 • edited Loading

jackyhawk commented Jun 2, 2018

jackyhawk commented Jun 2, 2018 • edited Loading

jackyhawk commented Jun 3, 2018

jackyhawk commented Jun 5, 2018

jackyhawk commented Jun 8, 2018

jackyhawk commented May 31, 2018 •

edited

Loading

jackyhawk commented Jun 1, 2018 •

edited

Loading

jackyhawk commented Jun 2, 2018 •

edited

Loading