
Course Outline
1. Introduction
- Overview of Hadoop
- MapReduce
- Hadoop Distributed File System
- What Is Pig?
- Why Use Pig?
2. Installing and Running Pig
- Downloading and Installing Pig
- Running Pig
3. Grunt
- Interpreting Pig Latin Scripts
- HDFS Commands
- Controlling Pig
4. The Pig Data Model
5. Basic Pig Latin
- Input and Output
- Relational Operations
- User Defined Functions
6. Advanced Pig Latin
- Advanced Relational Operations
- Using Pig with Legacy Code
- Integrating Pig and MapReduce
- Nonlinear Data Flows
- Controlling Execution
- Pig Latin Preprocessor
7. Developing and Testing Scripts
- Development Tools
- Testing Your Scripts with PigUnit
8. Tuning Pig
- Improving Script Performance
- Improving Performance with User Defined Functions
- Using Compression in Intermediate Results
- Data Layout Optimization
- Handling Bad Records
9. Embedding Pig Latin in Python
- Compile
- Bind
- Run
- Utility Methods
10. Writing Evaluation and Filter Functions
- Writing an Evaluation Function in Java
- Algebraic Interface
- Accumulator Interface
- Python UDFs
- Writing Filter Functions
11. Writing Load and Store Functions
- Load Functions
- Store Functions
12. Pig and and the Rest of the Hadoop Zoo
- Pig and Hive
- Cascading
- NoSQL Databases
- Metadata in Hadoop
13. Built-in User Defined Functions and Piggybank
|