 |

Hadoop for Administrators |
|
1. Hadoop Overview
- Why Hadoop?
- HDFS Concepts
- Blocks
- Namenodes and Datanodes
- MapReduce
- Interfaces
- Hive, Pig, HBase and other ecosystem projects
2. Planning a Hadoop Cluster
- General Planning
- Choosing Hardware
- Node Topologies
- Choosing the Software
3. Setting Up a Hadoop Cluster
- Cluster Setup and Installation
- Installing Java
- Creating a Hadoop User
- Installing Hadoop
- Testing the Installation
- SSH Configuration
- Hadoop Configuration
- Configuration Management
- Environment Settings
- Important Hadoop Daemon Properties
- Hadoop Daemon Addresses and Ports
- Other Hadoop Properties
- Post Install
- Benchmarking a Hadoop Cluster
- Hadoop Benchmarks
- User Jobs
- Hadoop in the Cloud
- Hadoop on Amazon EC2
4. Administering Hadoop
- HDFS
- Persistent Data Structures
- Safe Mode
- Audit Logging
- Tools
- Monitoring
- Logging
- Metrics
- Java Management Extensions
- Maintenance
- Routine Administration Procedures
- Commissioning and Decommissioning Nodes
- Upgrades
5. Managing and Scheduling Jobs
- Starting and stopping MapReduce jobs
- Hands-On Exercise: Managing jobs
- The FIFScheduler
- The Fair Scheduler
- Using the FairScheduler
6. Maintaining the Hadoop Cluster
- Checking HDFS with fsck
- Repairing a broken cluster
- Copying data with distcp
- Rebalancing cluster nodes
- Adding and removing cluster nodes
- Backing up and restoring data
- Upgrading and Migrating
- Examining NameNode Metadata
7. Monitoring and Optimizing the cluster
- Viewing log files
- Using the NameNode and JobTracker web user interfaces
- Interpreting Job Logs
- Reviewing other monitoring tools
- Other monitoring tools
- General steps towards optimization
- Benchmarking Your Cluster
8. Using External Data Sources with Hadoop
- Using Sqoop: Hadoop and SQL together
- Investigating other methods to get data into HDFS
9. Exploring other Hadoop Projects
- Pig
- Hive
- HBase
- Zookeeper
- Accumulo
10. Case Studies
|
|
 |