Search Engine
A Search Engine is a querying tool for searching for information matching a specified set of criteria, within a given set of data.
Contents
- 1 Specifications
- 2 Structure
- 3 Services
- 3.1 Google
- 3.2 Yahoo!
- 3.3 Bing
- 3.4 Twitter
- 3.5 Baidu
- 3.6 QQ
- 3.7 Yandex
- 3.8 DuckDuckGo
- 3.9 JRank
- 3.10 Blekko
- 3.11 Hakia
- 3.12 IndexTank
- 3.13 YaCy
- 3.14 Carrot2
- 3.15 Omgili
- 3.16 Oamos
- 3.17 ThumbShots
- 3.18 SearchMe
- 3.19 Quintura
- 3.20 EyePlorer
- 3.21 Kartoo
- 3.22 Clusty
- 3.23 Swoogle
- 3.24 DogPile
- 3.25 Grokker
- 3.26 Ask
- 3.27 Cuil
- 3.28 WolframAlpha
- 3.29 FeedMil
- 3.30 Dorthy
- 3.31 DeepDyve
- 3.32 FindLinks
- 3.33 MyWebSearch
- 3.34 StartPage
- 3.35 People
- 3.36 Products
- 4 Tools
- 5 Resources
- 6 Tutorials
- 7 External Links
- 8 References
- 9 See Also
Specifications
Robots.txt
- Robots.txt protocol: http://robots.org
OpenSearch
Sitemap
Sitemap specifies how to announce your website's URL layout and specific content entries to Search Engines such as Google, Yahoo! and Bing.
- Sitemap: http://sitemaps.org/ [10][11][12][13]
- Video Sitemap: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80472 [14][15]
Structure
Corpus
An index, database, word set or link list. An initial dataset upon which to search, rank and build query responses for.
Crawler
To build a #corpus corpus when starting out from scratch, you have a few choices on places you can get the data. The first and easiest one (that comes with a steep price) is to buy a large database within a particular domain that has already been manually curated and possibly ranked, sorted and categorized.
The more intensive yet cost-effective way to build a corpus is to generate it yourself from a publicly available dataset (just like Google did with the web in March 1998[16])
Services
The best example of a traditional search engine is currently that of a most basic search engine which provides a rich set of search results, the industry leader Google.
- Google: http://google.com [17]
Yahoo!
Yahoo! is an Internet Portal which also provides its own Search Engine, however the majority of its search functionality has been outsourced to Microsoft's Bing.
- Yahoo!: http://yahoo.com [18]
Bing
Bing is Microsoft's latest foray into the search market, and the successor to MSN Search.
- Bing: http://bing.com (formerly http://msn.com) [19]
Thanks to its widespread use, Micro-Blog Twitter is quickly becoming the leading search engine for immediate relevancy of current events and sentiments. They also recently acquired Summize[20] a real-time "twitter mind" search engine.
- Twitter (live) search: http://search.twitter.com/
Baidu
- Baidu: http://www.baidu.com/ (China's #1 search engine, #3 in world by volume)
The 3rd largest internet company by revenue and 4th largest search engine by volume is China's #1 portal & #2 search site
- QQ: http://www.qq.com/
- QQ Instant Messaging: http://www.imqq.com/ (with built-in search features leading to QQ search results in browser)
- Tencent (owner of QQ): http://www.tencent.com/en-us/at/abouttencent.shtml
Yandex
- Yandex: http://yandex.ru | API (Russia's #1 search engine, #5 in world by volume)[21]
DuckDuckGo
- DuckDuckGo: http://duckduckgo.com [22][23][24]
- DuckDuckGo - Instant Answer API: https://duckduckgo.com/api
[25][26] [27] [28] [29] [30] [31]
JRank
- JRank: http://www.jrank.org/ (focused on offering customized search for your website, picking up where Google left off when it deprecated/deactivated its Google Site Search & Google Search API)
Blekko
- Blekko: http://blekko.com [32][33][34]
Hakia
Hakia uses semantic approaches to building results for its search engine.
- HAKIA: http://hakia.com
IndexTank
- IndexTank - Hosted access to a large Search Index: http://indextank.com (cross-platform API)
YaCy
- YaCy - P2P search: http://yacy.net/en/ (good API)
Carrot2
- Carrot2 - open source search clustering: http://project.carrot2.org/
eTools
- eTools - Swiss commercial deployment of Carrot2 search cluster server: http://www.etools.ch/
Omgili
Oh My God I Love It (Omgili) - FORUM discussion search engine: http://omgili.com/
Oamos
- Search gone Cool & Interactive: http://oamos.com/
ThumbShots
Search the world's largest human-powered thumbnail/website directory
- Thumbshots site: http://www.thumbshots.net/
- Thumbshots Integration: http://www.thumbshots.org/Products/Thumbshots/IntegrationCode/tabid/101/Default.aspx
SearchCube
SearchCube is an innovative new service from Symmetri which provides search results in a 3D visual cube layout. It uses ThumbShots API to create the thumbnails of all the pages in the cube.
- SearchCube: http://www.symmetri.com/searchcube/
SearchMe
- SearchMe - Visual Thumbnail browsing with Protoflow (iPod Coverflow) type search capabilities: http://www.searchme.com/
Quintura
- Quintura - Tag Cloud based search engine: http://www.quintura.com/
EyePlorer
- EyePlorer Visual (Radial) Search: http://eyeplorer.com/eyePlorer/
Kartoo
- Kartoo - Visual Search (Requires Flash): http://www.kartoo.com/
Clusty
Clusty is a clusterization-based search engine which clusterizes (groups together) related data sources.
- Clusty - The clusterized Search Engine: http://clusty.com/
Swoogle
Search the Semantic Web on Swoogle to find existing ontologies (structures of data, but not the data itself).
- Swoogle: http://swoogle.umbc.edu/
DogPile
Aggregate several search engines' results with DogPile.
- DogPile: http://dogpile.com
Grokker
- Grokker - Taxonomical Search and Enterprise Search Solutions: http://www.grokker.com/
Ask
Ask used to have API access to their search capabilities, but have since removed it.
- Ask.com: http://ask.com
- Unofficial documentation of Ask's Web Search API: http://www.antezeta.com/ask/ask-web-search-api.html
Cuil
- Cuil - Arguably the most over-hyped only to be disappointing search engine ever (from former Google employees, claims to have world's largest index): http://cuil.com
WolframAlpha
WolframAlpha is a Knowledge Engine with one simple input field that gives access to a huge system, with trillions of pieces of curated data and millions of lines of algorithms.
- Wolfram|Alpha: http://www.wolframalpha.com/
- Participate in the Wolfram|Alpha project: Contribute a Data Stream: http://www.wolframalpha.com/participate/datastream.html
- Wolfram Alpha Braces for Overload - The computational knowledge engine expects a flood of users: http://www.technologyreview.com/web/22666/
FeedMil
- FeedMil: http://www.feedmil.com/
Dorthy
- Dorthy.com -- Search People's Dreams: http://dorthy.com
- A (Semantic) Search Engine for Dreams: http://www.readwriteweb.com/archives/dorthy_a_semantic_search_engine_for_dreams.php
DeepDyve
- DeepDyve delivers fast, easy access to the vast amounts of expert information hidden in the Deep Web: http://www.deepdyve.com/
FindLinks
- FindLinks -- The Yellow Pages Phone Book and Phone Directory: http://www.findlinks.com/
MyWebSearch
API-specific (by content type) search tool: http://home.mywebsearch.com/
StartPage
- StartPage -- the world's most private search engine: http://startpage.com/
People
Pipl
- Pipl: http://www.pipl.com
Spokeo
- Spokeo: http://www.spokeo.com/
CV Gadget
- CV Gadget: http://www.cvgadget.com/
Thomson Reuters
- CLEAR by Thomson Reuters: https://clear.thomsonreuters.com/
Products
Octoparts
- Octopart is a search engine for electronic parts: http://octopart.com/
Tools
- BlindSearch (compare results from multiple providers): http://blindsearch.fejus.com
- Open SiteExplorer - SEO, Link Popularity and Backlick Checker tool: http://www.opensiteexplorer.org/
Resources
- instantsearch.js: https://community.algolia.com/instantsearch.js/
- PHP Search Engine Showdown: http://tim.oreilly.com/pub/a/php/2006/02/16/search-engine-showdown.html?page=2
- Algorithm Implementation/Strings/Levenshtein distance: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
Lucene
- Lucene: http://lucene.apache.org/
- Lucene - Quick Guide: https://www.tutorialspoint.com/lucene/lucene_quick_guide.htm
Solr
- Solr project: http://lucene.apache.org/solr/
- wikipedia: Apache Solr
- Solr -- Indexing XML with Lucene and REST: http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html
- Solr tutorial: http://lucene.apache.org/solr/tutorial.html
Tutorials
- PHP Real-Time Search Using MySQL and AJAX: http://aproductivelife.blogspot.com/2011/10/php-real-time-search-using-mysql-and.html
- Real-Time Searching of Big Data with Solr and Hadoop: http://www.slideshare.net/OpenLogic/realtime-searching-of-big-data-with-solr-and-hadoop
- Tuning Solr in Near Real Time Search environments: http://java.dzone.com/articles/tuning-solr-near-real-time
- SOLR Search Index replication with BitTorrent: http://codeascraft.com/2012/01/23/solr-bittorrent-index-replication/
- Getting the closest string match: https://stackoverflow.com/questions/5859561/getting-the-closest-string-match (Lenvenschtein distance algorithm, is leading option... other than full NLP/ML)
External Links
- wikipedia: Search Engine
- wikipedia: Sitemap
- wikipedia: OpenSearch
- wikipedia: Robots.txt
- User Goals and Conversion from Search: http://www.optimizeandprophesize.com/jonathan_mendezs_blog/2006/12/user_goals_and_.html
- Mining The Thought Stream: http://www.techcrunch.com/2009/02/15/mining-the-thought-stream/
- Searching something new?: http://worldofitblog.wordpress.com/2009/03/08/searching-something-new/
- Visual Search Engines: http://www.infovis.net/printMag.php?num=198&lang=2
- 3 Flavors of Social Search: What to Expect: http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php
- Search and the ‘25% Solution’: http://www.mkbergman.com/854/brown-bag-lunch-search-and-the-25-solution/
- People Search Engines - They Know Your Dark Secrets…And Tell Anyone: http://www.pcworld.com/article/161018/people_search_engines_they_know_your_dark_secretsand_tell_anyone.html
- The Trinitarian Formula of Search: http://www.ewriting.pamil-visions.com/2008/07/19/search-war/
- Ask Jeeves: Why Buy Interactive Search Holdings?: http://searchenginewatch.com/3337511
- Google vs. Bing -- Bing holds its own in search-off: http://www.usatoday.com/tech/columnist/edwardbaig/2009-07-01-google-vs-Bing_N.htm
- Google .vs. Yahoo!: http://www.langreiter.com/exec/yahoo-vs-google.html?q=lantzilla
- The Google Sting -- Bing Is Cheating, Copying Our Search Results: http://searchengineland.com/google-bing-is-cheating-copying-our-search-results-62914
- Google, Microsoft trade barbs over Bing 'copying': http://news.cnet.com/8301-30684_3-20030265-265.html
- The Dirty Little Secrets of Search: http://www.nytimes.com/2011/02/13/business/13search.html?_r=1&src=me&ref=business
- Global Search Market Draws More than 100 Billion Searches per Month: http://www.comscore.com/Press_Events/Press_Releases/2009/8/Global_Search_Market_Draws_More_than_100_Billion_Searches_per_Month
- 7 Search Engines Google Obliterated: http://www.searchenginepeople.com/blog/engines-google-killed.html
- Solving Different URLs with Similar Text (DUST): http://www.seobythesea.com/?p=288
- Search.Ninja.pt2:http://zdnet.com/blog/seo/search-ninja-part-2-how-to-find-older-versions-of-software-and-much-more/3496
- Get the Green Web 3.0 Experience on Truevert: http://www.treehugger.com/clean-technology/get-the-green-web-30-experience-on-truevert.html
- Reddit co-founder talks Facebook versus Google (video): http://www.zdnet.com/blog/facebook/reddit-co-founder-talks-facebook-versus-google-video/11323?tag=nl.e539
- What We Lose As Search Gets Personal: htttp://www.looksmart.com/what-we-lose-as-search-gets-personal
- Can we buy your search engine?: http://scripting.com/stories/2012/01/24/canWeBuyYourSearchEngine.html
- Building the search engine of the future, one baby step at a time: http://googleblog.blogspot.ca/2012/08/building-search-engine-of-future-one.html
- The 10 Best Search Engines of 2013: http://netforbeginners.about.com/od/navigatingthenet/tp/top_10_search_engines_for_beginners.htm
- Payloads Are Neat, but Where’s a Complete Example for Solr?: http://java.dzone.com/articles/payloads-are-neat-wheres
- IBM Sees The Next Phase Of Search: http://www.mediapost.com/publications/article/253758/ibm-sees-the-next-phase-of-search.html (with BlueMix and Watson)
- Google’s Search app on Android can now find content buried in your apps: https://techcrunch.com/2016/08/31/googles-search-app-on-android-can-now-find-content-buried-in-your-apps/
- How DuckDuckGo Is Positioning Itself to Take on Google: https://www.fool.com/investing/general/2014/04/07/how-duckduckgo-is-positioning-itself-to-take-on-go.aspx
- The Best Private Search Engine— Alternatives to Google: https://hackernoon.com/untraceable-search-engines-alternatives-to-google-811b09d5a873
- Gartner Optimistic About Visual Search As An Emerging Technology: https://www.mediapost.com/publications/article/340003/gartner-optimistic-about-visual-search-as-an-emerg.html
- He Built Google's Ad Biz, And Is Now Launching A Subscription-Based Search Engine: https://www.mediapost.com/publications/article/361241/he-built-googles-ad-biz-and-is-now-launching-a-s.html
References
- ↑ OpenSeach 1.1 spec (draft 6): https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md
- ↑ OpenSeach 1.1 spec (draft 5): https://web.archive.org/web/20120510160608/http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_response_elements
- ↑ OpenSearch XSD: http://weblogs.asp.net/wkriebel/archive/2008/02/04/opensearch-xsd.aspx
- ↑ Submit your OpenSearch provider to Amazon A9 client (used in Alexa): http://opensearch.a9.com/
- ↑ Developer how to guide: http://www.opensearch.org/Documentation/Developer_how_to_guide
- ↑ OpenSearch v1.1 Cheat Sheet: http://www.scribd.com/doc/6114752/OpenSearch-Cheat-Sheet-15
- ↑ Introducing OpenSearch: http://www.xml.com/pub/a/2007/07/20/introducing-opensearch.html
- ↑ OpenSearch Google in Windows 7: http://www.mzzt.net/2009/01/14/opensearch-google-in-windows-7/
- ↑ Windows 7 Federated Search Providers: http://www.sevenforums.com/tutorials/742-windows-7-federated-search-providers.html
- ↑ Sitemap Generators: http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators
- ↑ C# Sitemap Generator: http://sourceforge.net/projects/sitemapgen/
- ↑ Google Sitemap Generator in PHP: http://www.idealog.us/2006/09/google_sitemap_.html
- ↑ PHP Sitemap Generator: http://www.phpclasses.org/package/5838-PHP-Generate-sitemaps-and-notify-updates.html
- ↑ How To Use Google Video XML Sitemaps For Video SEO: http://www.reelseo.com/how-video-sitemaps/
- ↑ How to create video sitemap to drive more traffic: http://fourblogger.com/how-to-create-video-sitemap-more-traffic/
- ↑ Foundin on Gd oole:http://www.wired.com/wired/archive/13.08/battelle.html
- ↑ Google Sitemap/Webmaster Tools: http://www.google.com/webmasters/tools/
- ↑ Yahoo! Site Explorer: http://siteexplorer.search.yahoo.com/submit
- ↑ Bing Webmasters' Tools: http://www.bing.com/toolbox/webmasters/
- ↑ Confirmed -- Twitter Acquires Summize Search Engine: http://techcrunch.com/2008/07/15/confirmed-twitter-acquires-summize-search-engine/
- ↑ Yandex Tries to Solidify Search Dominance, Keep Google Down in Russia: http://searchenginewatch.com/article/2157877/Yandex-Tries-to-Solidify-Search-Dominance-Keep-Google-Down-in-Russia
- ↑ A Duck & a Wiki Team Up Against the Content Farms: http://www.readwriteweb.com/archives/a_duck_a_wiki_team_up_against_the_content_farms.php
- ↑ Escape your search engine Filter Bubble! - An illustrated guideby DuckDuckGo.com: http://dontbubble.us/
- ↑ The Trouble With the Echo Chamber Online: http://www.nytimes.com/2011/05/29/technology/29stream.html
- ↑ Ideas for DuckDuckGo Instant Answer Plugins and Data sources: https://duckduckhack.uservoice.com/forums/5168-ideas-for-duckduckgo-instant-answer-plugins/status/904946
- ↑ DuckDuckGo FAQ - where do the search results come from?: http://help.dukgo.com/customer/portal/articles/216399
- ↑ How can I use DuckDuckGo in my application? Is there any API for the same?: https://www.quora.com/How-can-I-use-DuckDuckGo-in-my-application-Is-there-any-API-for-the-same
- ↑ DuckDuckGo API - PHP client: https://www.phpclasses.org/package/10443-PHP-Search-for-data-and-related-topics-from-DuckDuckGo.html
- ↑ DuckDuckGo slams Google following EU antitrust decision: https://www.theverge.com/2018/7/20/17595612/google-antitrust-eu-duckduckgo-chrome: https://www.theverge.com/2018/7/20/17595612/google-antitrust-eu-duckduckgo-chrome
- ↑ Twitter -- DuckDuckGo account: https://twitter.com/DuckDuckGo
- ↑ Fetch DuckDuckGo Web Search Results in 20 lines of Java code: https://medium.com/@sethsubr/fetch-duckduckgo-web-search-results-in-20-lines-of-java-code-3a34ea9da085
- ↑ No Joke? Blekko is 63rd Largest Pure Search Entity in the World: http://blog.searchenginewatch.com/101111-094700
- ↑ The Secrets Behind Blekko's Search Technology: http://www.readwriteweb.com/hack/2010/12/the-secrets-behind-blekkos-search-technology.php
- ↑ Blekko Search Engine Slashes Through the Web: http://tech.ca.msn.com/pcworld-article.aspx?cp-documentid=26184155
- ↑ DZone -- Refcard #137 - Understanding Lucene: https://dzone.com/refcardz/lucene
- ↑ Understanding Lucene - Powering Better Search Results: http://www.scribd.com/doc/51555360/DZone-Refcard-137-Understanding-Lucene-Powering-Better-Search-Results
See Also
Recommendation Engine | Advertising | Google | Yahoo | Bing | News | Websites | Multimedia | Local Business | Maps