> > CDBBDA Detailed outline

Cloudera Designing & Building Big Data Applications (CDBBDA)

Course Description Schedule Course Outline

Detailed Course Outline


Application Architecture
  • Scenario Explanation
  • Overview of the Development Environment
  • Identifying Sources of Input Data
  • Selecting the Appropriate Data Collection Technique
  • Organizing and Storing Data Sets
  • Tools for Data Analysis and Processing
  • Presenting Results to the User
Defining and Using Data Sets
  • Metadata Management
  • What is Apache Avro?
  • Avro Schemas
  • Avro Schema Evolution
  • Selecting a File Format
  • Performance Considerations
Using the Kite SDK Data Module
  • What is the Kite SDK?
  • Fundamental Data Module Concepts
  • Creating New Data Sets Using the Kite SDK
  • Loading, Accessing, and Deleting a Data Set
Importing Relational Data with Apache Sqoop
  • What is Apache Sqoop?
  • Basic Imports
  • Limiting Results
  • Improving Sqoop’s Performance
  • Sqoop 2
Capturing Data with Apache Flume
  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Configuration
  • Logging Application Events to Hadoop
Developing Custom Flume Components
  • Flume Data Flow and Common Extension Points
  • Custom Flume Sources
  • Developing a Flume Pollable Source
  • Developing a Flume Event-Driven Source
  • Custom Flume Interceptors
  • Developing a Header-Modifying Flume Interceptor
  • Developing a Filtering Flume Interceptor
  • Writing Avro Objects with a Custom Flume Interceptor
Managing Workflows with Apache Oozie
  • The Need for Workflow Management
  • What is Apache Oozie?
  • Defining an Oozie Workflow
  • Validation, Packaging, and Deployment
  • Running and Tracking Workflows Using the CLI
  • Hue UI for Oozie
Processing Data Pipelines with Apache Crunch
  • What is Apache Crunch?
  • Understanding the Crunch Pipeline
  • Comparing Crunch to Java MapReduce
  • Working with Crunch Projects
  • Reading and Writing Data in Crunch
  • Data Collection API
  • Functions
  • Utility Classes in the Crunch API
Working with Tables in Apache Hive
  • What is Apache Hive?
  • Accessing Hive
  • Basic Query Syntax
  • Creating and Populating Hive Tables
  • How Hive Reads Data
  • Using the RegexSerDe in Hive
Developing User-Defined Functions
  • What are User-Defined Functions?
  • Implementing a User-Defined Function
  • Deploying Custom Libraries in Hive
  • Registering a User-Defined Function in Hive
Executing Interactive Queries with Impala
  • What is Impala?
  • Comparing Hive to Impala
  • Running Queries in Impala
  • Support for User-Defined Functions
  • Data and Metadata Management
Understanding Cloudera Search
  • What is Cloudera Search?
  • Search Architecture
  • Supported Document Formats
Indexing Data with Cloudera Search
  • Collection and Schema Management
  • Morphlines
  • Indexing Data in Batch Mode
  • Indexing Data in Near Real Time
Presenting Results to Users
  • Solr Query Syntax
  • Building a Search UI with Hue
  • Accessing Impala through JDBC
  • Powering a Custom Web Application with Impala and Search
15. Conclusion