Quantcast
Channel: “I for one welcome our new computer overlords” » Recommender Systems
Viewing all articles
Browse latest Browse all 4

A recommendation webservice in 10 minutes

$
0
0

This post will give a preview to a simple to use, configurable webservice for the recommenders of Apache Mahout, which I developed in cooperation with the Berlin-based company Plista.

The project’s name is kornak-api (kornak is the polish word for mahout) and it provides a simple servlet-based recommender application available under the Apache license.

kornak-api aims to be a simple-to-setup and easy-to-use solution for small to medium-scale recommendation scenarios. You configure the recommenders you want to use, populate your database with user-item-interactions and train the recommenders afterwards. None of this requires you to write a single line of Java code!

As an example, this article will show you how to setup a recommendation webservice using the famous Movielens 1M dataset, which contains about a million of ‘five-star’ ratings that 6000 users gave to 4000 movies. After you ingested your database and trained your recommenders, they will be able to recommend new movies for the users in the dataset.

Please be aware that this article assumes that you are (at a general level) familiar with the concepts of recommendation mining and collaborative filtering. If you are looking for a practical introduction to those topics, I suggest you read the extremely well-written book Mahout in Action.

Prerequisites

Make sure you have Java 6, Maven, git and MySQL installed.

The first thing you should do is checkout the application from github.com/plista/kornakapi via git:
git clone https://github.com/plista/kornakapi.git kornakapi
After that, you need to create a database called movielens in MySQL with two tables. The table taste_preferences will contain all of your interaction data in the form of (user,item,rating) triples. The second table taste_candidates will be used later to allow you to filter and constrain the recommendations for certain use-cases.
CREATE DATABASE movielens;
USE movielens;

CREATE TABLE taste_preferences (
  user_id bigint(20) NOT NULL,
  item_id bigint(20) NOT NULL,
  preference float NOT NULL,
  PRIMARY KEY (user_id,item_id),
  KEY item_id (item_id)
);

CREATE TABLE taste_candidates (
  label varchar(255) NOT NULL,
  item_id bigint(20) NOT NULL,
  PRIMARY KEY (label,item_id)
);

Configuration

In this example, we want to use two different recommenders: one that computes similarities between the items based on the way they were rated and another one that uses a mathematical technique called matrix factorization to find highly preferrable items.

To setup those recommenders, you have to create a file named movielens.conf with the following content:
<configuration>

  <modelDirectory>/tmp/</modelDirectory>
  <numProcessorsForTraining>2</numProcessorsForTraining>

  <storageConfiguration>
    <jdbcDriverClass>com.mysql.jdbc.Driver</jdbcDriverClass>
    <jdbcUrl>jdbc:mysql://localhost/movielens</jdbcUrl>
    <username>dbuser</username>
    <password>secret</password>
  </storageConfiguration>

  <itembasedRecommenders>
    <itembasedRecommender>
      <name>itembased</name>
      <similarityClass>org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity</similarityClass>
      <similarItemsPerItem>25</similarItemsPerItem>
      <retrainAfterPreferenceChanges>1000</retrainAfterPreferenceChanges>
      <retrainCronExpression>0 0 1 * * ?</retrainCronExpression>
    </itembasedRecommender>
  </itembasedRecommenders>

  <factorizationbasedRecommenders>
    <factorizationbasedRecommender>
      <name>weighted-mf</name>
      <usesImplicitFeedback>false</usesImplicitFeedback>
      <numberOfFeatures>20</numberOfFeatures>
      <numberOfIterations>10</numberOfIterations>
      <lambda>0.065</lambda>
      <retrainAfterPreferenceChanges>1000</retrainAfterPreferenceChanges>
      <retrainCronExpression>0 0 1 * * ?</retrainCronExpression>
    </factorizationbasedRecommender>
  </factorizationbasedRecommenders>

</configuration>

This file contains all the information necessary to setup the recommendation webservice. The modelDirectory is the place where the recommenders will save their trained models to. Make sure this directory is writable and don’t use /tmp in a real setup. The recommenders will be trained in the background while the application still serves requests and with numProcessorsForTraining, you can configure how many cores should be used for the training. In storage­Configuration the database connection is configured, this should be pretty self-explananory.

Next up is the most important thing, the recommenders! Each recommender has a name, will be automatically retrained after a certain number of new datapoints have been added to the application (retrainAfterPreference­Changes) or at a certain point in time (retrainCronExpression).

In itembasedRecommenders, you can create a couple of recommenders that use item similarity for computing recommendations. You need to set the similarity measure to use via similarityClass, available measures can be found in Mahout’s JavaDoc. Furthermore, you need to tell the recommender how many similar items to compute per item (similarItemsPerItem).

In factorizationbasedRecommenders, you setup the recommenders that use matrix factorization. They need to know whether they data is implicit feedback such as clicks/pageviews or explicit feedback such as ratings (uses­Implicit­Feedback). Furthermore, you need to set the number of features (number­Of­Features) to use, the number of iterations for the training (number­Of­Iterations) and the learning rate (lambda), which you have to determine experimentally.

That’s it with preparation and configuration, now it’s time for the fun part!

Start the webservice

Go to the directory where you checked out the source code, and fire up a local tomcat server with the following command in which you have to insert the correct path to your configuration file:
mvn -Dkornakapi.conf=/path/to/movielens.conf tomcat:run
You should see the recommender application starting up now.

Ingestion

Now it’s time to feed the running service with some data! Download and unzip the Movielens 1M dataset from http://www.grouplens.org/node/12. Open a second shell and convert the data to Mahout’s standard input format with the following command:
cat ratings.dat | sed s/::/,/g | cut -d, -f1,2,3 > movielens1M.csv
After that is done, you can push the data file via HTTP POST to the recommender application:
curl -F "file=@movielens1M.csv" http://localhost:8080/kornakapi/batchSetPreferences?batchSize=1000
The data will be inserted into MySQL in a streaming batch manner now. Wait a few moments until you see the following console output:
INFO storage.MySqlStorage: imported 1000209 records in batch. done.

Training

The recommenders need to be trained before we can finally request recommendations, we kickstart this manually for each recommender by typing these URLs in a browser:

http://localhost:8080/kornakapi/train?recommender=itembased

http://localhost:8080/kornakapi/train?recommender=weighted-mf

Note that the training will be queued in the background and you won’t see any output in the browser. Look into the terminal where you started the application to see the training proceed. The recommenders and their caches will be automatically refreshed after the training has completed.

Recommendations

Once the training has completed, you can request recommendations by invoking the following URI with the name of the recommender to use, the id of the user to compute recommendations for and the number of recommendations you want to get:

http://localhost:8080/kornakapi/recommend?recommender=weighted-mf&userID=12&howMany=5

It will respond with a simple JSON answer like this:
[{itemID:557,value:5.988698},{itemID:578,value:5.0461025},{itemID:1149,value:4.9268165},{itemID:572,value:4.9265957},{itemID:3245,value:4.8139095}]

Filtering

It is common for recommendation scenarios that in special situations only certain items should be recommended. Imagine an online shop where products might be out of stock or a forum where posts might be outdated for example.

kornak-api offers a very simple solution for that. It let’s you create so called candidate sets, which you can imagine as sets of itemIDs with a chosen label.

We will create a set called testing and add the item 557 to it:

http://localhost:8080/kornakapi/addCandidate?label=testing&itemID=557

If we add the label to the recommendation request, only items contained in the candidate set will be returned:

http://localhost:8080/kornakapi/recommend?recommender=weighted-mf&userID=12&label=testing

[{itemID:557,value:5.988698}]

Outlook

That’s it, I hope you liked the first preview of kornak-api! In the next weeks more information and a detailed description of its API will be posted (and its source code will be commented :) ).


Flattr this


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images