From Data to Insights with Google Cloud Platform (DIGCP) – Outline

Detailed Course Outline

Module 1: Introduction to Data on Google Cloud Platform

Topics:

  • Analytics Challenges Faced by Data Analysts
  • Big Data On-premise Versus on the Cloud
  • Real-world Use Cases of Companies Transformed Through Analytics on the Cloud
  • Google Cloud Project Basics

Objectives:

  • Compare data infrastructure on-premises versus on Google Cloud.
Module 2: Analyzing Large Datasets with BigQuery

Topics:

  • Data Analyst Tasks, Challenges, and Google Cloud Data Tools
  • Fundamental BigQuery Features
  • Google Cloud Tools for Analysts, Data Scientists, and Data Engineers

Objectives:

  • Identify data analyst tasks, and challenges, and introduce Google Cloud data tools
  • Explore 9 fundamental BigQuery features
  • Compare the differences in roles and toolsets between data analysts, data scientists, and data engineers
  • Access the BigQuery web UI and explore a public dataset with basic SQL
Module 3: Exploring your Data with SQL

Topics:

  • Common Data Exploration Techniques
  • Use SQL to Query Public Datasets

Objectives:

  • Compare common data exploration techniques
  • Identify the key components of a basic SQL SELECT statement and common pitfalls
  • Discuss the basics of SQL functions and how they create calculated fields with input parameters
  • Explore BigQuery public datasets
  • Troubleshoot dataset quality issues by analyzing duplicate records with SQL in the BigQuery Web UI
Module 4: Cleaning and Transforming your Data with Dataprep

Topics:

  • 5 Principles of Dataset Integrity
  • Dataset Shape and Skew
  • Clean and Transform Data using SQL
  • Introducing Dataprep by Trifacta

Objectives:

  • Characterize different dataset shapes and potential skew
  • Clean and transform data using SQL
  • Clean and transform data using Dataprep
Module 5: Visualizing Insights and Creating Scheduled Queries

Topics:

  • Data Visualization Principles
  • Common Data Visualization Pitfalls
  • Google Data Studio

Objectives:

  • Compare data visualizations and make recommendations for improvement
  • Create dashboards and visualizations with Google Data Studio
Module 6: Storing and Ingesting New Datasheets

Topics:

  • Permanent vs Temporary Data Tables
  • Ingesting New Datasheets

Objectives:

  • Differentiate between permanent and temporary data tables
  • Identify what types and formats of data BigQuery can ingest
  • Differentiate between native BigQuery table storage and external data source connections
  • Load new data into BigQuery
Module 7: Enriching your Data Warehouse with JOINs

Topics:

  • Merge Historical Data Tables with UNION
  • Introduce Table Wildcards for Easy Merges
  • Review Data Schemas: Linking Data Across Multiple Tables
  • JOIN Examples and Pitfalls

Objectives:

  • Explain when to use UNIONs and when to use JOINs
  • Identify the key pitfalls when joining and merging datasets
  • Differentiate between join types visually
  • Explain how union wildcards work and when to use them
  • Write SQL JOINs and UNIONs against a dataset in the BigQuery web UI
Module 8: Advanced Features and Partitioning your Queries and Tables for Advanced Insights

Topics:

  • Advanced Functions (Statistical, Analytic, User-defined)
  • Date-Partitioned Tables

Objectives:

  • Identify the available statistical approximation functions and user-defined functions
  • Apply large-scale record estimation with approximate aggregation functions
  • Deconstruct an analytical window query and explain when to use RANK() and PARTITION
  • Explain when to use Common Table Expressions (WITH) to break apart complex queries
Module 9: Designing Schemas that Scale: Arrays and Structs in BigQuery

Topics:

  • BigQuery Versus Traditional Relational Data Architecture
  • ARRAY and STRUCT Syntax
  • BigQuery Architecture

Objectives:

  • Differentiate between BigQuery and traditional data architecture
  • Work with ARRAYs and STRUCTs as part of nested fields in data schemas
Module 10: Optimizing Queries for Performance

Topics:

  • BigQuery Performance Pitfalls
  • Prevent Data Hotspots
  • Diagnose Performance Issues with the Query Explanation Map

Objectives:

  • Identify BigQuery performance pitfalls
  • Discuss the Query Explanation map and how to interpret MAX and AVG processing times per stage
  • Describe how to analyze and troubleshoot broken queries
Module 11: Controlling Access with Data Security Best Practices

Topics:

  • Hashing Columns
  • Authorized Views
  • IAM and BigQuery Dataset Roles
  • Access Pitfalls

Objectives:

  • Review data access roles within Google Cloud and BigQuery
  • Highlight key data access pitfalls and how to avoid them
Module 12: Predicting Visitor Return Purchases with BigQuery ML

Topics:

  • Machine Learning on Structured Data
  • Scenario: Predicting Customer Lifetime Value
  • Choosing the Right Model Type
  • Creating ML models with SQL

Objectives:

  • Explain how ML on structured data drives value
  • Describe how customer LTV can be predicted with an ML model
  • Choose the right model type for different structured data use cases
  • Create ML models with SQL
Module 13: Deriving Insights from Unstructured Data Using Machine Learning

Topics:

  • ML Drives Business Value
  • How does ML on Unstructured Data Work?
  • Choosing the Right ML Approach
  • Pre-built AI Building Blocks
  • Customizing Pre-built Models with AutoML
  • Building a Custom Model

Objectives:

  • Discuss how ML is able to drive business value
  • Explain how ML on unstructured data works
  • Differentiate between pre-built ML models, custom models, and new models when considering an AI application strategy