Recommendation Engine
A Recommendation Engine is a system, program or application whose purpose is to receive as inputs a set of parameters related to a given user or system instance, and return as outputs a set of recommendations related to that user or system instance.
Recommendation Engines are similar in nature to, and in some cases synonymous to, a Recommender System. The point of differentiation between the two lies most likely in the fact that one runs as an autonomous, self-controlled or self-contained system (Recommender System), whereas the other would function more as a modular, universal and integratable component, for use in all types of front-end applications or backend systems, for all types of users and domains (Recommendation Engine).
Investment in Recommendation Engine technologies has been minimal, or otherwise internal and company-specific, thus not much information exists on how much has actually been invested into researching Recommendation Engines or Recommendation algorithms, however a few notable startups have emerged with varying degrees of success.[1] Some recent opportunities for funding this type of research have arisen through the Recommender Start-up challenge as well as the infamously won and concluded NetFlix challenge.
Contents
Algorithms
A number of techniques and approaches exist in creating recommendations. Some of the most powerful and popular existing algorithms are Statistical, Behavioral, Collaborative and Personality-driven.
The best algorithm would likely use the optimum combination of the known techniques and best practices.
- EXACT STRING MATCHING ALGORITHMS: http://www-igm.univ-mlv.fr/~lecroq/string/[2]
String Analysis
Matching
Similarity
n-Grams
N-grams are sequences of characters or whole words (terms) extracted from text or documents. An N-gram is a set of N consecutive characters extracted from a word. Similar words will have a high proportion of N-grams in common, typically of length 2 (bigrams) or 3 (trigrams). For instance, the word Tika results in the generation of the bigrams:
*T, TI, IK, KA, A*
and trigrams:
**T, *TI, TIK, IKA, KA*, A**.
The "*" denotes a padding space.
While the work "tiki" (as in tiki room), gives:
*T, TI, IK, KI, I*
and
**T, *TI, TIK, IKI, KI*, I**.
Character-based N-grams are used in measuring the similarity of character strings. Some applications using character-based N-grams are spelling checker, stemming, and OCR.[3]
Difference
NLP
- NLP
- Natural Language Full-Text Searching in MySQL: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Tagging
Term-Weighting
- term frequency–inverse document frequency: wikipedia: Tf–idf[5]
- How do you implement a “Did you mean”?: http://stackoverflow.com/questions/41424/how-do-you-implement-a-did-you-mean/258290#258290
Sentiment Analysis
- wikipedia: Sentiment analysis
- Deep Dive Into Sentiment Analysis: https://dzone.com/articles/breakthrough-research-papers-and-models-for-sentim
- Sentiment Analysis on Twitter Data Using Neo4j and Google Cloud: https://www.kennybastani.com/2019/09/sentiment-analysis-on-twitter-data.html
Collaborative Filtering
- Implementing a Rating-Based Item-to-Item Recommender System in PHP/SQL: http://lemire.me/fr/abstracts/TRD01.html
- OpenSlopeOne (PHP): http://code.google.com/p/openslopeone/
Matrix co-efficient
Matrix Factorization/Decomposition
- wikipedia: Matrix factorization
- wikipedia: Matrix approximation (also known as Singular Value Decomposition or SVD)
Pearson Correlation
Ensemble Learning
Social Network Analysis
Nearest Neighbor
- wikipedia: K-nearest neighbor algorithm
- wikipedia: Nearest neighbor search
- Approximate Nearest Neighbor (ANN) in Python: http://scipy.org/scipy/scikits/wiki/AnnWrapper
Neural Networks
Neural networks all boil down to the same thing. You have:
- a given set of inputs
- network topology
- activation function
- weights on the nodes' inputs
- outputs
- a means to measure and correct error.
Each type of neural network might have its own way of doing each of those things, but they are present all the time. Then, you train the network by feeding in a series of input sets that have known output results. You run this training set as much as you'd like without over or under training (which is as much your guess as it is the next guy's), and then you're ready to roll. Essentially, your input set can be described as a certain set of qualities that you believe have relevance to the underlying function at hand (for instance: precipitation, humidity, temperature, illness, age, location, cost, skill, time of day, day of week, work status, and gender may all have an important role in deciding whether or not person will go golfing on a given day). You must therefore decide what exactly you are trying to recommend and under what conditions. Your network inputs can be boolean in nature (0.0 being false and 1.0 being true, for instance) or mapped in a pseudo-continuous space (where 0.0 may mean not at all, .45 means somewhat, .8 means likely, and 1.0 means yes). This second option may give you the tools to map confidence level for a certain input, or simple a math calculation you believe is relevant.[6]
Self-Organizing Map
- wikipedia: Self-organizing map
- A Self-Organizing Map Based Knowledge Discovery for Music Recommendation Systems : http://www.springerlink.com/content/xhcyn5rj35cvncvf/fulltext.pdf
- The use of Self-Organizing Maps in Recommender Systems: http://rslab.movsom.com/paper/somrs/html/chapter7.html
Statistical
Machine Learning
Clustering
- wikipedia: Cluster analysis
- wikipedia: K-means Clustering
- wikipedia: K-armed bandit
- K-Means Clustering (PHP): http://phpir.com/clustering/
Bayes Method
- wikipedia: Bayes theorem
- wikipedia: Bayesian inference
- wikipedia: Naive Bayes classifier
- Bayes (in PHP): http://www.ibm.com/developerworks/web/library/wa-bayes1/ | Pt.2 | Pt.3
Association
- wikipedia: Association rule learning
- wikipedia: Apriori algorithm
- WINEPI algorithm looks at
- MINEPI
Prediction
- Simple linear regression with PHP - Part 1: www.ibm.com/developerworks/web/library/wa-linphp/ | Pt.2
- Viterbi Algorithm
- wikipedia: Hidden Markov model
Fuzzy Logic
- wikipedia: Fuzzy logic
- Genetic Algorithm for Hello World: http://www.puremango.co.uk/2010/12/genetic-algorithm-for-hello-world/
- Genetic (fuzzy logic) Algorithm Examples: http://www.puremango.co.uk/2011/03/genetic-algorithm-examples/
Rule Engine
An alternative to having a system look through a large data set and try to mathematically compute recommendations is to use tried and tested Rule-based systems
- See section Rule Engine
Inference
Inference is the use of rules to decide what action to take next (i.e. what outputs to return, which algorithm to use in the next statistical analysis task, what triggers to activate, notifications to send, etc).
- See section: Inference
Description Logic
Description Logic (sometimes also referred to as a Reasoner) is similar to doing reagular inference through rules, except the inference is:
- applied to "N" number of datasets via their "N" number of respective ontologies
- the subsequent mapping of terms in those ontologies to one another
- automated generation of links between datasets
- association of links or relationships within datasets
- then lastly, regular outcome evaluation (as in ordinary Rule Engines), based on the "description" provided in a specific DL format.
- See section: DL
Tools
Open Source
The following is a list of known Open Source Recommendation Engines, filters and algorithms:
- OpenRecommender (cross-platform)
- Apache Mahout (Java) [8][9]
- SUGGEST: http://glaros.dtc.umn.edu/gkhome/suggest/overview (Recommendation Engine which is able to filter data sets for relevant entries and suggest those to an end user)[10][11]
- cicindela2 (Perl)
- Coletivo - A simple Rails 3 recommendations engine: https://github.com/diogenes/coletivo[12]
- MyMediaProject(C#)
- LensKit (by GroupLense/MovieLense project): http://lenskit.grouplens.org/
- Vogoo (PHP, site no longer available [13])
- Mr.Job[14]
Proprietary
- OpenStrands (owned by Strands inc., less features available)
- Microsoft has been rumored to be working on a recommender no official statement had been made, but now: http://www.projectemporia.com/ | Infer.NET open source library
- Google Labs working on a recommender related to Google Suggest[15] and major openly public supporter of Machine Learning libraries and tools
- Yahoo! Developer Network may also be working on their own offerings behind the curtain, starting with this competition: http://learningtorankchallenge.yahoo.com/
- Taboola: http://www.taboola.com/
- Twine (acquired by Evri in March, 2010[16])
- SoMR (no longer open source)
- DirectedEdge: http://www.directededge.com/
Resources
- Open Source Recommendation Engines in Java: http://programmingbulls.com/open-source-recommendation-engine
- RECSYS - ACM Recommender Systems annual international conference: http://recsys.acm.org/
- Recked - A Night of Recommendation Technologies: http://www.recked.org/
- A survey of collaborative filtering techniques -- Advances in Artificial Intelligence 2009: http://downloads.hindawi.com/journals/aai/2009/421425.pdf (by Xiaoyuan Su and Taghi M. Khoshgoftaar)
- Collaborative Filtering for Implicit Feedback Datasets: http://www2.research.att.com/~yifanhu/PUB/cf.pdf (by Hu, Koren and Volinsky)
- Large-scale Parallel Collaborative Filtering for the Netflix Prize: http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf (by Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan)
- Scalable Machine Learning course notes, Lecture 8: http://alex.smola.org/teaching/berkeley2012/recommender.html (by Alex Smola - Section 2 from slide 34 in particular)
- Introduction to Machine Learning, Lecture XVI: https://class.coursera.org/ml/lecture/index (by Andrew Ng)
Tutorials
- Python - writing a basic recommendation engine from scratch using NumPy: http://software-carpentry.org/4_0/matrix/recommend/[17]
- How to Design Retail Recommendation Engines with Neo4j: http://www.youtube.com/watch?v=oMTmG4ClO5I
- Powering Real-Time Recommendations with Graph Database Technology: https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_WP_Recommendations_EN_BUS.pdf
- Article recommendations and increasing engagement with OpenAI GPT-3 Embeddings: https://blog.scottlogic.com/2022/02/23/word-embedding-recommendations.html
- Building a Recommendation System using Word2vec - A Unique Tutorial with Case Study in Python: https://www.analyticsvidhya.com/blog/2019/07/how-to-build-recommendation-system-word2vec-python/
External Links
- RecSys Conference: http://recsys.acm.org/
- GoogleFight - Recommendation Engine .vs. Search Engine: http://www.googlefight.com/index.php?lang=en_GB&word1=Recommendation+Engine&word2=Search+Engine
- The race to create a 'smart' Google: http://money.cnn.com/magazines/fortune/fortune_archive/2006/11/27/8394347/
- Recommendation Engine MyStrands Expands War Chest to $55m to Go Beyond Music: http://www.readwriteweb.com/archives/mystrands_55m.php
- Digg Filter, a Recommendation Engine for Digg - Interview with Founder: http://www.readwriteweb.com/archives/digg_filter_recommendation_engine_digg.php
- The Reddit Recommendation Engine: Does It Work At All?: http://www.dharmesh.com/Blog/bid/504/The-Reddit-Recommendation-Engine-Does-It-Work-At-All
- Getting Shoppers to Crave More (or just the right things?): http://money.cnn.com/2007/08/22/smbusiness/recommendation_websites.fsb/?postversion=2007082409
- Likaholix - new recommendation engine enters private beta: http://www.geeksaresexy.net/2009/03/04/likaholix-new-recommendation-engine-enters-private-beta/
- A Comprehensive Look At Digg’s Recommendation Engine: http://searchengineland.com/a-comprehensive-look-at-diggs-recommendation-engine-14470
- DIGG - Recommendation Engine whitepaper: http://blog.digg.com/wp-content/uploads/2008/06/whitepaper-recommendation-engine.pdf
- How to use the Digg Recommendation Engine to your advantage: http://www.blogstorm.co.uk/digg-recommendation-engine/
- Recommendations Online - Click here for the upsell: http://money.cnn.com/magazines/business2/business2_archive/2007/07/01/100117056/index.htm
- Who Has Time For All This Video Content?: http://www.mediapost.com/publications/?fa=Articles.showArticle&art_aid=87879
- Using Semantic Relations for Content-basedRecommender Systems in Cultural Heritage: http://www.chip-project.org/presentation/wangetal-wop.pdf
- Evaluating Collaborative Filtering Recommender Systems: http://web.engr.oregonstate.edu/~herlock/papers/eval_tois.pdf
- Amazon's Page Recommender: Foreshadowing A New Web Service?: http://radar.oreilly.com/2008/07/amazon-page-recommender.html
- Trust-aware Bootstrapping of Recommender Systems: http://sra.itc.it/people/massa/publications/wrs_ecai_2006_paolo_massa_bootstrapping_trustaware_recommender_systems.pdf
- Should The Government Be More Like Google And Wikipedia?: http://www.businessinsider.com/collective-knowledge-systems-2010-1?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+typepad%2Falleyinsider%2Fsilicon_alley_insider+(Silicon+Alley+Insider)
- How to create my own recommendation engine?: http://stackoverflow.com/questions/1407841/how-to-create-my-own-recommendation-engine
- Open source film recommendation engine from Filmaster.com: http://polishlinux.org/gnu/open-source-film-recommendation-engine/
- Amazon.com Recommendations -- Item-to-Item Collaborative Filtering (original paper summarizing the E-Commerce CF Recommender work done from 1998-2003): http://www.computer.org/portal/web/csdl/doi/10.1109/MIC.2003.1167344
- Geeking with Greg (the expert who implemented Amazon's first CF Recommender): http://glinden.blogspot.com/
- +1’s -- the right recommendations right when you want them—in your search results: http://googleblog.blogspot.com/2011/03/1s-right-recommendations-right-when-you.html
- Pinterest Acquires Machine Learning Commerce Recommendation Engine Kosei: http://techcrunch.com/2015/01/21/facebook-past-google-present-pinterest-future/
- Why aren’t "recommendation engines" very effective despite today’s technology?: https://medium.com/swlh/why-arent-recommendation-engines-very-effective-despite-today-s-technology-80efe22fa595
- Creating a unified Recommender with Mahout (Machine Learning lib) + Solr ([Search Engine] lib) Algorithm: https://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ (big 2013 boost in effectiveness through CCO)
- Recommenders and the Correlated Cross-Occurrence (CCO) Algorithm: https://occamsmachete.com/recommenders-and-the-correlated-cross-occurrence-algorithm/
- Adobe releases AI-powered product recommendations tool in Magento Commerce: https://www.zdnet.com/article/adobes-releases-ai-powered-product-recommendations-tool-in-magento-commerce/
- Democrats Seek To Regulate Recommendation Engines: https://www.mediapost.com/publications/article/367794/democrats-seek-to-regulate-recommendation-engines.html[18]
References
- ↑ See wikipedia list of Recommendation startups: http://en.wikipedia.org/wiki/Recommendation_search_engines#Recommendation_search_engines
- ↑ SORTING AND SEARCHING ALGORITHMS: http://epaperpress.com/sortsearch/download/sortsearch.pdf
- ↑ Understanding information content with Apache Tika: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html
- ↑ dbrec — Music Recommendations Using DBpedia: http://www.springerlink.com/content/ej2n3l8546158453/
- ↑ Term-weighting approaches in automatic text retrieval (1988): http://www.apl.jhu.edu/~paulmac/744/papers/SaltonBuckley.pdf
- ↑ http://stackoverflow.com/a/2315324/335867
- ↑ Stats for Machine Learning: http://videolectures.net/bootcamp07_keller_bss/
- ↑ Taste (merged with Apache Mahout Aug-2011): http://taste.sourceforge.net/old.html
- ↑ Minion (discontinued-lastupdate 2011, encourages Mahout use): http://java.net/projects/minion
- ↑ pysuggest Last known stable release: http://pypi.python.org/pypi/pysuggest/1.0 (is a Python-based Library for the SUGGEST Recommendation Engine)
- ↑ Google Code - pysuggest Project page: http://code.google.com/p/pysuggest/(no longer available)
- ↑ SVD Ruby Recommendation System (Ruby, no longer maintained): http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
- ↑ Vogoo old source available at sourceforge: http://sourceforge.net/projects/vogoo/
- ↑ Yelp's mrJob - Powering Recommendations and now Open Source: http://www.readwriteweb.com/cloud/2010/10/yelps-mrjob-powering-recommend.php
- ↑ Google Suggest graduates: http://www.googlelabs.com/show_details?app_key=agtnbGFiczIwLXd3d3ITCxIMTGFic0FwcE1vZGVsGIQpDA
- ↑ Evri acquires Twine for a semantic search team-up: http://venturebeat.com/2010/03/11/evri-twine-radar-networks/
- ↑ http://hackerne.ws/item?id=2031388
- ↑ Amendment to Section 230 of the Communications Act of 1934 to limit the liability protection provided by such section when a provider of an inter-active computer service knew or should have known such provider was making a personalized recommendation of third-party information: https://energycommerce.house.gov/sites/democrats.energycommerce.house.gov/files/documents/101421%20EC%20Section%20230%20Text.pdf