Detailed Course Outline
HDFS Introduction
- HDFS Overview
 - HDFS Components and Interactions
 - Additional HDFS Interactions
 - Ozone Overview
 - Exercise: Working with HDFS
 
YARN Introduction
- YARN Overview
 - YARN Components and Interaction
 - Working with YARN
 - Exercise: Working with YARN
 
Working with RDDs
- Resilient Distributed Datasets (RDDs)
 - Exercise: Working with RDDs
 
Working with DataFrames
- Introduction to DataFrames
 - Exercise: Introducing DataFrames
 - Exercise: Reading and Writing DataFrames
 - Exercise: Working with Columns
 - Exercise: Working with Complex Types
 - Exercise: Combining and Splitting DataFrames
 - Exercise: Summarizing and Grouping DataFrames
 - Exercise: Working with UDFs
 - Exercise: Working with Windows
 
Hive and Spark Integration
- Hive and Spark Integration
 - Exercise: Spark Integration with Hive
 
Distributed Processing Challenges
- Shuffle
 - Skew
 - Order
 
Spark Distributed Processing
- Spark Distributed Processing
 - Exercise: Explore Query Execution Order
 
Spark Distributed Persistence
- DataFrame and Dataset Persistence
 - Persistence Storage Levels
 - Viewing Persisted RDDs
 - Exercise: Persisting DataFrames
 
Data Engineering Service
- Create and Trigger Ad-Hoc Spark Jobs
 - Orchestrate a Set of Jobs Using Airflow
 - Data Lineage using Atlas
 - Auto-scaling in Data Engineering Service
 
Workload XM
- Optimize Workloads, Performance, Capacity
 - Identify Suboptimal Spark Jobs
 
Appendix: Working with Datasets in Scala
- Working with Datasets in Scala
 - Exercise: Using Datasets in Scala