* Yahoo! Webscope -- User-NewsItems (DATASETS): http://webscope.sandbox.yahoo.com/ (only available for academic use by faculty & university researchers)<ref>Yahoo Opens Largest Database to the Public: https://dzone.com/articles/yahoo-open-largest-database-to-the-public</ref>
* New York Public Library Digital Collections API: http://api.repo.nypl.org/<ref>What data is avaialble in NYPL LOD set?: http://menus.nypl.org/data</ref>
* British National Bibliography -- Collection Metadata: https://www.bl.uk/collection-metadata/downloads#<ref>Going Meta - a series on graphs, semantics and knowledge: https://www.youtube.com/watch?v=NQqWBnyQlS4 | [https://github.com/jbarrasa/goingmeta SRC]</ref>

The LinkedData Cloud (Sep.2008)[1]

Linked Open Data (also commonly referred to as Linking Open Data, Linked Data for short, and/or abbreviated LOD) is data that is made available for sharing and/or reuse with external sources or third parties without intellectual, social or legal limitation. [2]





See: RDF


See: RDFa

N3 tuples

See: N3


See: Turtle

Linked Open Data Community

Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." [7]

LinkedData Cloud

As a graphical represntation of the critical data sources publishing open and linkable data, the LinkedData Cloud is the pride and joy of the LinkedData community. Sources of the LinkedData cloud include Wikipedia, Freebase, MusicBrainz, the Linked MovieDatabase project, the Notable Names DataBase (NNDB), Government Census Data (i.e. US or Canada), FBI - Uniform Crime Reports (UCR), CIA World FactBook, BBC Programmes, CrunchBase company listings, FOAF people profiles and DOAP project profiles across the web, and many many more similar projects.

Linked Data Platform

Linked Data Platform (LDP) defines a set of rules for HTTP operations on web resources, some based on RDF, to provide an architecture for read-write Linked Data on the web.


Open Data Initiative


Big Data

Big Data is technically speaking, any large NoSQL datastore or high volume/size traditional SQL database instance, export, archive or script. Due to marketing hype from many "Big Data providers/vendors", there has been an attempt to make Big Data synonymic with Web Analytics tools, however these are just one part of Big Data as an overall concept. In particular Big Data includes:

  1. storage
  2. backup
  3. replication
  4. availability
  5. optimization
  6. categorization
  7. clustering
  8. filtering
  9. querying
  10. real-time analytics
  11. data mining
  12. task automation (i.e. retrieval/reporting, printing, alerting, etc)
  13. logging
  14. monitoring

...of large and/or high-volume (i.e. frequently accessed/updated) data sets. In simplest terms, Big Data is all the issues that traditioal DBMS are already designed to handle, however on an even larger web-scale that requires additional tooling and management methodologies.

Major Data Dumps

Linked Data Sets (i.e., with Dereferenceable URIs) available as RDF Dumps

Minor Data Dumps

Unlinked Data, Valuable Data Sources

Unlinked Data (also referred to as Non-Dumped Data) is either a proprietary database or otherwise inaccessible database whose raw data is not shared or made available.

TV & Movies

* TheTVDB: http://thetvdb.com
* Toonariffic - Sample Search: http://www.toonarific.com/search_simple.php?s_search=transformers&Button_Update=Search
* AlluC: http://alluc.org
* Movie Forumz: http://movie-forumz.org
* Surf the Channel: http://surfthechannel.com



  • ElutaXML Specification - Canadian Jobs Data API: http://www.eluta.ca/elutaxml (Eluta is a search engine that specializes in just one thing: finding new job announcements at employers across Canada)
  • LittleSis*: https://littlesis.org/ (free database of who-knows-who at the heights of business & government)
  • OpenCorporate: https://opencorporates.com/ (largest open database of companies in the world)
  • LandMatrix: https://landmatrix.org/ (data visualisations & corresponding public online database on land deals and suspicious "grabs" or otherwise noteworthy "buy-ups")
  • Organized Crime & Corruption Reporting Project (OCCRP) - Investigative Dashboard: https://id.occrp.org/ (ever-expanding list of databases containing information on companies from all over the world)

Business Intelligence

Business Intelligence is the gleaning of useful information from large amounts of business-related data, including anything from finding out what times to utilize load-balancing due to higher user access to a web site/service, to discovering fraudulent financial activity in a trading system by analyzing many years worth of transactional data.

  • See also: Analytics
  • See also: PowerBI
  • DOMO: http://www.domo.com/ (promises "Business Information to make better decisions", and to let you "manage your business from one platform")
  • Omnity: http://omnity.io/ (promises to help you "explore billions of relationships from many information sources helping you get to actionable insight rapidly")

Common Tag is an open tagging format developed to make content more connected, discoverable and engaging. Unlike free-text tags, Common Tags are references to unique, well-defined concepts, complete with metadata and their own URLs. With Common Tag, site owners can more easily create topic hubs, cross-promote their content, and enrich their pages with free data, images and widgets.







External Links


See Also

