Pig Hadoop Tutorial Course Content
Introduction to the Pig
- What Is the Pig?
- Features of the Pig’s
- Pig Use Cases
- Interacting with the Pig
Basic Data Analysis with the Pig
- Pig Latin Syntax
- Loading the Data
- Simple Data Types
- Field Definitions
- Data Output
- Viewing the Schema
- Filtering & Sorting Data
- Commonly Used Functions
- H&s-On Exercise: Using Pig for ETL Processing
Processing Complex Data with the Pig
- Storage Formats
- Complex/Nested Data Types
- Grouping
- Built-In Functions for Complex Data
- Iterating Grouped Data
- H&s-On Exercise: Analyzing Ad Campaign Data with Pig
Multi-Dataset Operations with the Pig
- Techniques for Combining Data Sets
- Joining Data Sets in Pig
- Set Operations
- Splitting Data Sets
- H&s-On Exercise: Analyzing Disparate Data Sets with Pig
Extending the Pig
- Adding Flexibility with Parameters
- Macros & Imports
- UDFs
- Contributed Functions
- Using Other Languages to Process Data with Pig
- H&s-On Exercise: Extending Pig with Streaming & UDFs
Pig Troubleshooting & Optimization
- Troubleshooting Pig
- Logging
- Using Hadoop’s Web UI
- Optional Demo: Troubleshooting a Failed Job with the Web UI
- Data Sampling & Debugging
- Performance Overview
- The Execution Plan
- Tips for Improving the Performance of Your Pig Jobs
Introduction to the Hive
- What Is Hive?
- Hive Schema & Data Storage
- Comparing Hive to Traditional Databases
- Hive vs. Pig
- Hive Use Cases
- Interacting with Hive
Relational Data Analysis with the Hive
- Hive Databases & Tables
- Basic HiveQL Syntax
- Data Types
- Joining Data Sets
- Common Built-In Functions
- H&s-On Exercise: Running Hive Queries on the Shell, Scripts, & Hue
Hive Data Management
- Hive Data Formats
- Creating Databases & Hive-Managed Tables
- Loading Data into Hive
- Altering Databases & Tables
- Self-Managed Tables
- Simplifying Queries with Views
- Storing Query Results
- Controlling Access to Data
- H&s-On Exercise: Data Management with Hive