Implicit

Fast Python Collaborative Filtering for Implicit Datasets.

This project provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets:

Alternating Least Squares as described in the papers Collaborative Filtering for Implicit Feedback Datasets and Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.
Bayesian Personalized Ranking.
Item-Item Nearest Neighbour models using Cosine, TFIDF or BM25 as a distance metric.

All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU's. Approximate nearest neighbours libraries such as Annoy, NMSLIB and Faiss can also be used by Implicit to speed up making recommendations.

To install:

pip install implicit

Basic usage:

import implicit

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(item_user_data)

# recommend items for a user
user_items = item_user_data.T.tocsr()
recommendations = model.recommend(userid, user_items)

# find related items
related = model.similar_items(itemid)

The examples folder has a program showing how to use this to compute similar artists on the last.fm dataset.

For more information see the documentation.

Articles about Implicit

These blog posts describe the algorithms that power this library:

There are also several other blog posts about using Implicit to build recommendation systems:

Requirements

This library requires SciPy version 0.16 or later. Running on OSX requires an OpenMP compiler, which can be installed with homebrew: brew install gcc.

GPU Support requires at least version 8 of the NVidia CUDA Toolkit. The build will use the nvcc compiler that is found on the path, but this can be overriden by setting the CUDAHOME enviroment variable to point to your cuda installation.

This library has been tested with Python 2.7, 3.5 and 3.6 on Ubuntu, OSX and Windows.

Benchmarks

Simple benchmarks comparing the ALS fitting time versus Spark and QMF can be found here.

Optimal Configuration

I'd recommend configuring SciPy to use Intel's MKL matrix libraries. One easy way of doing this is by installing the Anaconda Python distribution.

For systems using OpenBLAS, I highly recommend setting 'export OPENBLAS_NUM_THREADS=1'. This disables its internal multithreading ability, which leads to substantial speedups for this package. Likewise for Intel MKL, setting 'export MKL_NUM_THREADS=1' should also be set.

Released under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
benchmarks		benchmarks
docs		docs
examples		examples
implicit		implicit
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
appveyor.yml		appveyor.yml
cuda_setup.py		cuda_setup.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implicit

Articles about Implicit

Requirements

Benchmarks

Optimal Configuration

About

Releases

Packages

Languages

License

falschparker82/implicit

Folders and files

Latest commit

History

Repository files navigation

Implicit

Articles about Implicit

Requirements

Benchmarks

Optimal Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages