UMBC Training Centers logo

Hadoop for Administrators

 

Course Description | Course Outline | Hadoop & Accumulo Training | IT Training

1. Hadoop Overview

  1. Why Hadoop?
  2. HDFS Concepts
  3. Blocks
  4. Namenodes and Datanodes
  5. MapReduce
  6. Interfaces
  7. Hive, Pig, HBase and other ecosystem projects

2. Planning a Hadoop Cluster

  1. General Planning
  2. Choosing Hardware
  3. Node Topologies
  4. Choosing the Software

3. Setting Up a Hadoop Cluster

  1. Cluster Setup and Installation
  2. Installing Java
  3. Creating a Hadoop User
  4. Installing Hadoop
  5. Testing the Installation
  6. SSH Configuration
  7. Hadoop Configuration
  8. Configuration Management
  9. Environment Settings
  10. Important Hadoop Daemon Properties
  11. Hadoop Daemon Addresses and Ports
  12. Other Hadoop Properties
  13. Post Install
  14. Benchmarking a Hadoop Cluster
  15. Hadoop Benchmarks
  16. User Jobs
  17. Hadoop in the Cloud
  18. Hadoop on Amazon EC2

4. Administering Hadoop

  1. HDFS
  2. Persistent Data Structures
  3. Safe Mode
  4. Audit Logging
  5. Tools
  6. Monitoring
  7. Logging
  8. Metrics
  9. Java Management Extensions
  10. Maintenance
  11. Routine Administration Procedures
  12. Commissioning and Decommissioning Nodes
  13. Upgrades

5. Managing and Scheduling Jobs

  1. Starting and stopping MapReduce jobs
  2. Hands-On Exercise: Managing jobs
  3. The FIFScheduler
  4. The Fair Scheduler
  5. Using the FairScheduler

6. Maintaining the Hadoop Cluster

  1. Checking HDFS with fsck
  2. Repairing a broken cluster
  3. Copying data with distcp
  4. Rebalancing cluster nodes
  5. Adding and removing cluster nodes
  6. Backing up and restoring data
  7. Upgrading and Migrating
  8. Examining NameNode Metadata

7. Monitoring and Optimizing the cluster

  1. Viewing log files
  2. Using the NameNode and JobTracker web user interfaces
  3. Interpreting Job Logs
  4. Reviewing other monitoring tools
  5. Other monitoring tools
  6. General steps towards optimization
  7. Benchmarking Your Cluster

8. Using External Data Sources with Hadoop

  1. Using Sqoop: Hadoop and SQL together
  2. Investigating other methods to get data into HDFS

9. Exploring other Hadoop Projects

  1. Pig
  2. Hive
  3. HBase
  4. Zookeeper
  5. Accumulo

10. Case Studies