Recommendation Engine

From BC$ MobileTV Wiki
Jump to: navigation, search

A Recommendation Engine is a system, program or application whose purpose is to receive as inputs a set of parameters related to a given user or system instance, and return as outputs a set of recommendations related to that user or system instance.

Recommendation Engines are similar in nature to, and in some cases synonymous to, a Recommender System. The point of differentiation between the two lies most likely in the fact that one runs as an autonomous, self-controlled or self-contained system (Recommender System), whereas the other would function more as a modular, universal and integratable component, for use in all types of front-end applications or backend systems, for all types of users and domains (Recommendation Engine).

Investment in Recommendation Engine technologies has been minimal, or otherwise internal and company-specific, thus not much information exists on how much has actually been invested into researching Recommendation Engines or Recommendation algorithms, however a few notable startups have emerged with varying degrees of success.[1] Some recent opportunities for funding this type of research have arisen through the Recommender Start-up challenge as well as the infamously won and concluded NetFlix challenge.



Algorithms

A number of techniques and approaches exist in creating recommendations. Some of the most powerful and popular existing algorithms are Statistical, Behavioral, Collaborative and Personality-driven.

The best algorithm would likely use the optimum combination of the known techniques and best practices.

String Analysis

Matching

Similarity

n-Grams

N-grams are sequences of characters or whole words (terms) extracted from text or documents. An N-gram is a set of N consecutive characters extracted from a word. Similar words will have a high proportion of N-grams in common, typically of length 2 (bigrams) or 3 (trigrams). For instance, the word Tika results in the generation of the bigrams:

*T, TI, IK, KA, A* 

and trigrams:

**T, *TI, TIK, IKA, KA*, A**. 

The "*" denotes a padding space.

While the work "tiki" (as in tiki room), gives:

*T, TI, IK, KI, I* 

and

**T, *TI, TIK, IKI, KI*, I**. 

Character-based N-grams are used in measuring the similarity of character strings. Some applications using character-based N-grams are spelling checker, stemming, and OCR.[3]


Difference


NLP

Tagging

Term-Weighting

Sentiment Analysis

Collaborative Filtering

CF

Matrix co-efficient

Matrix Factorization/Decomposition
Pearson Correlation
Ensemble Learning


Social Network Analysis

Nearest Neighbor

Neural Networks

Neural networks all boil down to the same thing. You have:

  1. a given set of inputs
  2. network topology
  3. activation function
  4. weights on the nodes' inputs
  5. outputs
  6. a means to measure and correct error.

Each type of neural network might have its own way of doing each of those things, but they are present all the time. Then, you train the network by feeding in a series of input sets that have known output results. You run this training set as much as you'd like without over or under training (which is as much your guess as it is the next guy's), and then you're ready to roll. Essentially, your input set can be described as a certain set of qualities that you believe have relevance to the underlying function at hand (for instance: precipitation, humidity, temperature, illness, age, location, cost, skill, time of day, day of week, work status, and gender may all have an important role in deciding whether or not person will go golfing on a given day). You must therefore decide what exactly you are trying to recommend and under what conditions. Your network inputs can be boolean in nature (0.0 being false and 1.0 being true, for instance) or mapped in a pseudo-continuous space (where 0.0 may mean not at all, .45 means somewhat, .8 means likely, and 1.0 means yes). This second option may give you the tools to map confidence level for a certain input, or simple a math calculation you believe is relevant.[6]

Self-Organizing Map


Statistical

Machine Learning

Clustering

Bayes Method

Association

Prediction

Fuzzy Logic


Rule Engine

An alternative to having a system look through a large data set and try to mathematically compute recommendations is to use tried and tested Rule-based systems

Inference

Inference is the use of rules to decide what action to take next (i.e. what outputs to return, which algorithm to use in the next statistical analysis task, what triggers to activate, notifications to send, etc).

Description Logic

Description Logic (sometimes also referred to as a Reasoner) is similar to doing reagular inference through rules, except the inference is:

  1. applied to "N" number of datasets via their "N" number of respective ontologies
  2. the subsequent mapping of terms in those ontologies to one another
  3. automated generation of links between datasets
  4. association of links or relationships within datasets
  5. then lastly, regular outcome evaluation (as in ordinary Rule Engines), based on the "description" provided in a specific DL format.
  • See section: DL




Tools

Open Source

The following is a list of known Open Source Recommendation Engines, filters and algorithms:


Proprietary


Resources


Tutorials


External Links


References

  1. See wikipedia list of Recommendation startups: http://en.wikipedia.org/wiki/Recommendation_search_engines#Recommendation_search_engines
  2. SORTING AND SEARCHING ALGORITHMS: http://epaperpress.com/sortsearch/download/sortsearch.pdf
  3. Understanding information content with Apache Tika: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html
  4. dbrec — Music Recommendations Using DBpedia: http://www.springerlink.com/content/ej2n3l8546158453/
  5. Term-weighting approaches in automatic text retrieval (1988): http://www.apl.jhu.edu/~paulmac/744/papers/SaltonBuckley.pdf
  6. http://stackoverflow.com/a/2315324/335867
  7. Stats for Machine Learning: http://videolectures.net/bootcamp07_keller_bss/
  8. Taste (merged with Apache Mahout Aug-2011): http://taste.sourceforge.net/old.html
  9. Minion (discontinued-lastupdate 2011, encourages Mahout use): http://java.net/projects/minion
  10. pysuggest Last known stable release: http://pypi.python.org/pypi/pysuggest/1.0 (is a Python-based Library for the SUGGEST Recommendation Engine)
  11. Google Code - pysuggest Project page: http://code.google.com/p/pysuggest/(no longer available)
  12. SVD Ruby Recommendation System (Ruby, no longer maintained): http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
  13. Vogoo old source available at sourceforge: http://sourceforge.net/projects/vogoo/
  14. Yelp's mrJob - Powering Recommendations and now Open Source: http://www.readwriteweb.com/cloud/2010/10/yelps-mrjob-powering-recommend.php
  15. Google Suggest graduates: http://www.googlelabs.com/show_details?app_key=agtnbGFiczIwLXd3d3ITCxIMTGFic0FwcE1vZGVsGIQpDA
  16. Evri acquires Twine for a semantic search team-up: http://venturebeat.com/2010/03/11/evri-twine-radar-networks/
  17. http://hackerne.ws/item?id=2031388
  18. Amendment to Section 230 of the Communications Act of 1934 to limit the liability protection provided by such section when a provider of an inter-active computer service knew or should have known such provider was making a personalized recommendation of third-party information: https://energycommerce.house.gov/sites/democrats.energycommerce.house.gov/files/documents/101421%20EC%20Section%20230%20Text.pdf

See Also

Search Engine | Recommender System | Recommendation