2nd Netflix-KDD Workshop

Workshop on
Large-Scale Recommender Systems and the Netflix Prize Competition

Held in conjunction with
The 13th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD 2008)

August 24-27, 2008, Las Vegas, NV

Call for Papers Instructions for Authors Accepted Papers Workshop Program Program Committee

 

Accepted Papers
  • Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering

    Abstract: Two major challenges for collaborative filtering problems are scalability and sparseness. Some powerful approaches have been developed to resolve these challenges. Two of them are Matrix Factorization (MF) and Fuzzy C-means (FCM). In this paper we combine the ideas of MF and FCM, and propose a new clustering model—Modified Fuzzy C-means (MFCM). MFCM has better interpretability than MF, and better accuracy than FCM. MFCM also supplies a new perspective on MF models. Two new algorithms are developed to solve this new model. They are applied to the Netflix Prize data set and acquire comparable accuracy with that of MF.

  • Gavin Potter. Putting the collaborator back into collaborative filtering

    Abstract: Most of the published approaches to collaborative filtering and recommender systems concentrate on mathematical approaches for identifying user / item preferences. This paper demonstrates that by considering the psychological decision making processes that are being undertaken by the users of the system it is possible to achieve a significant improvement in results. This approach is applied to the Netflix dataset and it is demonstrated that it is possible to achieve a score better than the Cinematch score set at the beginning of the Netflix competition without even considering individual preferences for individual movies. The result has important implications for both the design and the analysis of the data from collaborative filtering systems.

  • Andreas Toescher, Michael Jahrer and Robert Legenstein. Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems

    Abstract: Neighborhood-based algorithms are frequently used modules of recommender systems. Usually, the choice of the similarity measure used for evaluation of neighborhood relationships is crucial for the success of such approaches. In this article we propose a way to calculate similarities by formulating a regression problem which enables us to extract the similarities from the data in a problem-specific way. Another popular approach for recommender systems is regularized matrix factorization (RMF). We present an algorithm—neighborhood-aware matrix factorization—which efficiently includes neighborhood information in a RMF model. This leads to increased prediction accuracy. The proposed methods are tested on the Netflix dataset.

  • Tamas Kiss, Miklos Kurucz, István Nagy and Andras A. Benczur. Large-scale recommenders based on Association Rule Mining

    Abstract: In this paper we demonstrate the applicability of association rule mining in recommender systems; our methods alone reach RMSE 0.94–0.96 while in combination with the most competitive solutions we reach an RMSE improvement of 0.4%. While requiring huge amount of computational power, association rule based recommenders apparently give predictions orthogonal to other methods (factorization, nearest neighbors and restricted Boltzmann machines) used in the large scale collaborative filtering experiments in the Netflix prize competition. By a somewhat unconventional choice of the ECLAT algorithm in an implementation tuned for directly computing rules needed to predict the rating of a given user–movie pair, we require a few seconds on average for a single prediction that totals to a just affordable but huge CPU time requirement somewhat above 1000 hours. By optimizing for low memory usage we were able to highly parallelize the experiments and by slight additional effort in the most expensive steps compute several variants by inexpensive rule postprocessing.

  • Oscar Celma and Pedro Cano. From hits to niches? or how popular artists can bias music recommendations

    Abstract: This paper presents some experiments to analyse the popularity effect in music recommendation. Popularity is measured in terms of total playcounts, and the Long Tail model is used in order to characterize all the items of a music collection. Furthermore, metrics derived from complex network analysis are used in order to detect the influence of the most popular artists in the recommendation network. The results from the experiments reveal that, as expected by its inherent social component, the collaborative filtering approach is prone to popularity. This has some consequences on the discovery ratio as well as in the Long Tail navigation. On the other hand, in both content--based and human expert–based approaches artists are linked independently of their popularity. This allows one to navigate from a mainstream artist to a Long Tail artist in just one or two clicks.

  • Domonkos Tikk, Gabor Takacs, Istvan Pilaszy and Bottyan Nemeth. Investigation of Various Matrix Factorization Methods for Large Recommender Systems

    Abstract: Matrix Factorization (MF) based approaches have proben to be efficient for rating-based recommendation systems. In this work, we propose several matrix factorization approaches with improved prediction accuracy. We introduce a novel and fast (semi)-positive MF approach that approximates the features by using positive values for either users or items. We describe a momentum-based MF approach. A transductive version of MF is also introduced, which uses information from test instances (namely the ratings users have given for certain items) to improve prediction accuracy. We describe an incremental variant of MF that efficiently handles new users/ratings, which is crucial in a real-life recommender system. A hybrid MF–neighbor-based method is also discussed that further improves the performance of MF. The proposed methods are evaluated on the Netflix Prize dataset, and we show that they can achieve very favorable RMSE and running time.