Hadoop Tutorial 1 - Intro to Hadoop and HDFS architecture
After an exhausting search for Hadoop tutorials, I decided to design my own one. There are several reasons here. First, most online training courses are not free and cost too much for a colloge student like me. Meanwhile, there are few free ones and they are a little bit outdated. But the good news is that Hadoop is an open source project, so most resource can be found on the internet. The only tough thing is that how to find the right one that can help you learn in a natural, efficient and effective way. That is my goal from the beginning and also what this tutorial is for. I will share my learning process and materials here. Importantly, as a new baby to Hadoop, I can share my solutions to those common problems for beginners. We all want to save time, otherwise you may not need to use Hadoop. Yeah, Hadoop saves your life.
In this article, I will deal with three topics:
- What problems does Hadoop solve and how does it solves them
- Hadoop distributed file system (HDFS) basics and architecture
- Setting up pseudo-distributed file system in your computer
1. Big data and Hadoop
Alright, I use big data to attract your eyeballs even though it is not our main topic. But that is the background of Hadoop. We will begin with Yahoo! Hadoop Tutorial. Module 1 talks about the scope of problems applicable to Hadoop and how Hadoop addresses these problems. It gives us a big picture. And I like big pictures since they make our learning more efficient.
|Yahoo! Hadoop Tutorial Module 1||Link|
You may also want to download the whole Yahoo! Hadoop Tutorial as a zip file.
2. Hadoop distributed file system architecture
After reading Module 1, we know the two most important components in Hadoop are:
- Hadoop distributed file system (HDFS) for storing data
- MapReduce framework for processing data
This part will give you a complete view and better understanding of how HDFS exactly works. First, please read the Distributed File System Basics in Module 2. Yes, only the basics part. Then read the HDFS Architecture in Hadoop project site. These two are very close and extremely useful.
|Yahoo! Hadoop Tutorial Module 2||Link|
3. Set up Hadoop on your computer and get muddy
Now we can do something fun on our PCs by setting up Hadoop. For Linux and Mac users, the Hadoop project guide is very helpful. Follow the instructions step by step and you will make it. At this point, we just want to try a single node cluster. Windows user can use the Yahoo! guide. For mac users, I know you may use Homebrew to install Hadoop. However, I would recommend manual downloading and installing. Here is my experience on installing Hadoop 2.6 on my macbook with OS X Yosemite.
|Hadoop installation guide for Linux/Mac||Link|
|Windows installation guide||Link|
|Installing Hadoop on my Mac OS X Yosemite||Link|
A recommended latest brief installation using Homebrew base on the offical guide. I successfully tried it on my Mac OS X Yosemite. FYI, this guide is in the second page of Google search results. Those top ones are all outdated or messy as far as I know.
Congratulations! You have finished the tutorial 1. I will post the following tutorials soon!