cdm training

Cloudera Developer Training for Apache Spark

Cloudera Developer Training for Apache Spark Training Introduction:

Cloudera Developer Training for Apache Spark Course Content

Why Spark?
  • Problems with the Traditional Large-Scale Systems
  • Introducing the Spark
Spark Basics
  • Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with the Spark
Working with the RDDs
  • Operations of the RDD
  • Key-Value Pair RDDs
  • MapReduce & Pair RDD Operations
Hadoop Distributed File System
  • Why HDFS?
  • Architecture of the HDFS
  • Using the HDFS
Running Spark on a Cluster
  • A Spark Standalone Cluster
  • Spark Standalone Web UI
Parallel Programming with the Spark
  • RDD Partitions & HDFS Data Locality
  • Working with the Partitions
  • Executing Parallel Operations
Caching & Persistence
  • RDD Lineage
  • Overview  of the Caching
  • Distributed Persistence
Writing the Spark Applications
  • Spark Applications vs. Spark Shell
  • Creating the Spark Context
  • Configuring the Spark Properties
  • Building & Running a Spark Application
  • Logging
Spark, Hadoop, & the Enterprise Data Center
  • Spark & the Hadoop Ecosystem
  • Spark & MapReduce
Spark Streaming
  • Example: Streaming Word Count
  • Operations of the Other Streaming
  • Sliding Window Operations
  • Developing the Spark Streaming Applications
Common Spark Algorithms
  • Iterative of an Algorithms
  • Graph Analysis
  • Machine Learning
Improving Spark Performance
  • Shared Variables: Broadcast Variables & Accumulators
  • Common Performance Issues