UMBC Training Centers logo

Apache Pig

 

Course Description | Course Outline | Hadoop & Accumulo Training | IT Training

Course Outline

1. Introduction

  • Overview of Hadoop
  • MapReduce
  • Hadoop Distributed File System
  • What Is Pig?
  • Why Use Pig?

2. Installing and Running Pig

  • Downloading and Installing Pig
  • Running Pig

3. Grunt

  • Interpreting Pig Latin Scripts
  • HDFS Commands
  • Controlling Pig

4. The Pig Data Model

  • Data Types
  • Schemas

5. Basic Pig Latin

  • Input and Output
  • Relational Operations
  • User Defined Functions

6. Advanced Pig Latin

  • Advanced Relational Operations
  • Using Pig with Legacy Code
  • Integrating Pig and MapReduce
  • Nonlinear Data Flows
  • Controlling Execution
  • Pig Latin Preprocessor

7. Developing and Testing Scripts

  • Development Tools
  • Testing Your Scripts with PigUnit

8. Tuning Pig

  • Improving Script Performance
  • Improving Performance with User Defined Functions
  • Using Compression in Intermediate Results
  • Data Layout Optimization
  • Handling Bad Records

9. Embedding Pig Latin in Python

  • Compile
  • Bind
  • Run
  • Utility Methods

10. Writing Evaluation and Filter Functions

  • Writing an Evaluation Function in Java
  • Algebraic Interface
  • Accumulator Interface
  • Python UDFs
  • Writing Filter Functions

11. Writing Load and Store Functions

  • Load Functions
  • Store Functions

12. Pig and and the Rest of the Hadoop Zoo

  • Pig and Hive
  • Cascading
  • NoSQL Databases
  • Metadata in Hadoop

13. Built-in User Defined Functions and Piggybank

  • Built-in UDFs
  • Piggybank