Cloudera Developer Training for Apache Hadoop

Lorem ipsum

Motivation for the Hadoop
  • Problems with the Traditional Large-Scale Systems
  • Requirements for a New Approach
Basic Concepts
  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components
Writing a MapReduce Program
  • MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • Driver Code
  • Mapper
  • Reducer
  • Streaming API
  • Using Eclipse for Rapid Development
  • New MapReduce AP
Integrating the Hadoop into the Workflow
  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from a Relational Database Management System with Sqoop
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using the Fuse DFS & Hoop
Delving Deeper into the Hadoop API
  • Tool Runner
  • Testing with MR Unit
  • Reducing Intermediate Data with Combiners
  • Configuration & Close Methods for Map/Reduce Setup & Teardown
  • Writing Partitioners for Better Load Balancing
  • Directly Accessing HDFS
  • Using the Distributed Cache
Common MapReduce Algorithms
  • Sorting & Searching
  • Indexing
  • Machine Learning with Mahout
  • Term Frequency
  • Inverse Document Frequency
  • Word Co-Occurrence
Using Hive & Pig
  • Hive Basics
  • Pig Basics
Practical Development Tips & Techniques
  • Debugging MapReduce Code
  • Using the LocalJobRunner Mode for Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Split table File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
Advanced MapReduce Programming
  • Custom Writable & Writable Comparable
  • Saving Binary Data Using Sequence Files & Avro Files
  • Creating Input Formats & Output Formats
Joining Data Sets in the MapReduce
  • Map-Side Joins
  • Secondary Sort
  • Reduce-Side Joins
Graph Manipulation in the Hadoop
  • Graph Techniques
  • Representing Graphs in Hadoop
  • Implementing a Sample Algorithm: Single Source Shortest Path
Creating the Workflows with Oozie
  • Motivation for Oozie
  • Workflow Definition Format