MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes.
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing and an open source implementation of the MapReduce algorithm.
- Apache Hadoop: http://hadoop.apache.org/
- Overview of the Hadoop Distributed File System (HDFS): http://wiki.apache.org/hadoop/HDFS
Apache Spark is a fast and general engine for large-scale data processing. Spark runs on Hadoop (using YARN), Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
- Apache Spark - Lightning-fast cluster computing: http://spark.apache.org
- Practical applications for which Spark sparkles brightly: http://www.ibmbigdatahub.com/blog/practical-applications-which-spark-sparkles-brightly
- The power behind Apache Spark: http://www.ibmbigdatahub.com/blog/power-behind-apache-spark (or how Spark sparkles brightly at above link's mentioned "practical applications")
- Hadoop On Azure: https://www.hadooponazure.com/
- Finally! A Hadoop Hello World That Isn’t A Lame Word Count!: http://java.dzone.com/articles/finally-hadoop-hello-world
- Yahoo! Hadoop Tutorial: http://developer.yahoo.com/hadoop/tutorial/
- Hadoop Installation Tutorial on Windows or (using VirtualBox + Ubuntu distro): http://www.learncomputer.com/hadoop-install/
- Hadoop (local Windows install with Eclipse & cygwin: http://ebiquity.umbc.edu/Tutorials/Hadoop/
- Downloading and installing Hadoop (natively on Linux): http://wiki.apache.org/hadoop/GettingStartedWithHadoop
- Installing Apache Hadoop - The Definitive Guide: http://oreilly.com/other-programming/excerpts/hadoop-tdg/installing-apache-hadoop.html
- Hadoop on Azure - An Introduction: http://architects.dzone.com/articles/hadoop-azure-introduction
- Getting Hadoop, Hive and HBase Up and Running in Less than 15 Minutes: http://architects.dzone.com/articles/getting-hadoop-hive-and-hbase
- Apache Spark -- How-to Tutorials: http://www.cloudera.com/content/cloudera/en/developers/home/developer-admin-resources/how-tos/apache-spark-how-tos.html
- Introduction to Hadoop Map Reduce: http://dzone.com/articles/introduction-to-hadoop-mapnbspreduce (Java example with historical temperature)
- Hadoop buzz continues to excite the cloud: http://news.cnet.com/8301-13846_3-10345769-62.html
- Hadoop adoption limps along in Enterprise - so perhaps Big Data isn't such a big deal?: http://www.zdnet.com/article/hadoop-adoption-limps-along-so-perhaps-big-data-isnt-such-a-big-deal/
See AlsoNoSQL | ML | Mahout
- Hadoop Security Basics (In Under 5 Minutes): https://blog.dataiku.com/sound-smart-on-hadoop-security