UMBC Training Centers logo

Hadoop for Developers

 

Course Description | Course Outline | Hadoop & Accumulo Training | IT Training

1. What is Hadoop?

  1. Understanding distributed systems and Hadoop
  2. Comparing SQL databases and Hadoop
  3. Understanding MapReduce
  4. Counting words with Hadoop—running your first program
  5. History of Hadoop

2. Starting Hadoop

  1. The building blocks of Hadoop
  2. Setting up SSH for a Hadoop cluster
  3. Running Hadoop
  4. Web-based cluster UI

3. Components of Hadoop

  1. Working with files in HDFS
  2. Anatomy of a MapReduce program
  3. Reading and writing

4. Writing basic MapReduce programs

  1. Constructing the basic template of a MapReduce program
  2. Counting things
  3. Adapting for Hadoop's API changes
  4. Streaming in Hadoop
  5. Improving performance with combiners

5. Advanced MapReduce

  1. Chaining MapReduce jobs
  2. Joining data from different sources
  3. Creating a Bloom filter

6. Programming Practices

  1. Developing MapReduce programs
  2. Monitoring and debugging on a production cluster
  3. Tuning for performance

7. Cookbook

  1. Passing job-specific parameters to your tasks
  2. Probing for task-specific information
  3. Partitioning into multiple output files
  4. Inputting from and outputting to a database
  5. Keeping all output in sorted order

8. Managing Hadoop

  1. Setting up parameter values for practical use
  2. Checking system’s health
  3. Setting permissions
  4. Managing quotas
  5. Enabling trash
  6. Removing DataNodes
  7. Adding DataNodes
  8. Managing NameNode and Secondary NameNode
  9. Recovering from a failed NameNode
  10. Designing network layout and rack awareness
  11. Scheduling jobs from multiple users

9. Running Hadoop in the cloud

  1. Introducing Amazon Web Services
  2. Setting up AWS
  3. Setting up Hadoop on EC2
  4. Running MapReduce programs on EC2
  5. Cleaning up and shutting down your EC2 instances
  6. Amazon Elastic MapReduce and other AWS services

10. Programming with Pig

  1. Installing Pig
  2. Running Pig
  3. Learning Pig Latin through Grunt
  4. Speaking Pig Latin
  5. Working with user-defined functions
  6. Working with scripts
  7. Seeing Pig in action—example of computing similar patents

11. Hadoop Related Technologies

  1. Hive
  2. Apache Accumulo
  3. Other Hadoop-related stuff

12. Case studies

  1. Converting 11 million image documents from the New York Times archive
  2. Mining data at China Mobile
  3. Recommending the best websites at StumbleUpon
  4. Building analytics for enterprise search—IBM’s Project ES2