Introduction to Data Engineering
This module discusses the role of data engineering and motivates the claim why data engineering should be done in the Cloud
- Module introduction
- The role of a data engineer
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partner effectively with other data teams
- Manage data access and governance
- Demo: Finding PII in your dataset with the DLP API
- Build production-ready pipelines
- Google Cloud customer case study
- Lab Intro: Using BigQuery to do Analysis
- LAB: Using BigQuery to do Analysis: In this lab, you analyze 2 different public datasets, run queries on them, separately and then combined, to derive interesting insights.
- QUIZ
Building a Data Lake
In this module, we describe what data lake is and how to use Cloud Storage as your data lake on Google Cloud.
- Module Introduction
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Build a data lake using Cloud Storage
- Secure Cloud Storage
- Store all sorts of data types
- Cloud SQL as a relational data lake
- Lab Intro: Loading Taxi Data into Google Cloud SQL
- LAB: Loading Taxi Data into Google Cloud SQL 2.5:In this lab you will import data from CSV text files into Cloud SQL and then carry out some basic data analysis using simple queries.
- QUIZ
Building a Data Warehouse
In this module, we talk about BigQuery as a data warehousing option on Google Cloud
- Module Introduction
- The modern data warehouse
- Introduction to BigQuery
- Demo: Querying TB of data in seconds
- Get started with BigQuery
- Load data into BigQuery
- Lab Intro: Loading Data into BigQuery
- LAB: Loading data into BigQuery: This lab focuses on how to ingest data into tables inside of BigQuery.
- Explore schemas
- Demo: Exploring Schemas
- Schema design
- Nested and repeated fields
- Demo: Nested and repeated fields
- Design the optimal schema for BigQuery
- Lab Intro: Working with JSON and Array data in BigQuery
- LAB: Working with JSON and Array data in BigQuery 2.5: In this lab you will work with semi-structured data (ingesting JSON, Array data types) inside of BigQuery. You will practice loading, querying, troubleshooting, and unnesting various semi-structured datasets.
- Optimize with partitioning and clustering
- Lab Intro: Partitioned Tables in BigQuery
- LAB: Partitioned Tables in Google BigQuery:This lab focuses on how to query partitioned datasets and how to create your own dataset partitions to improve query performance, which reduces cost.
- Review
- QUIZ