Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.

We can organize this training at your preferred date and location. Contact Us!

PHILIP MORRIS SABANCI PAZARLAMA VE SATIS A.S

PAPİLON SAVUNMA GUVENLIK SISTEMLERI BILISIM HİZM. İHR. SAN. VE TİC. A.Ş.

İNFRASİS BİLGİ TEKNOLOJİLERİ TİC. LTD. ŞTİ.

Viessmann Isı Teknolojileri San. ve Tic Ltd. Şti.

North Caspian Operating Company N.V. (NCOC N.V.)

Training Outline

Introduction to Building Batch Data Pipelines

This module reviews different methods of data loading: EL, ELT and ETL and when to use what

Module introduction
EL, ELT, ETL
Quality considerations
How to carry out operations in BigQuery
Shortcomings
ETL to solve data quality issues
QUIZ
Introduction to Building Batch Data Pipelines

Executing Spark on Dataproc

This module shows how to run Hadoop on Dataproc, how to leverage Cloud Storage, and how to optimize your Dataproc jobs.

Module introduction
The Hadoop ecosystem
Running Hadoop on Dataproc
Cloud Storage instead of HDFS
Optimizing Dataproc
Optimizing Dataproc storage
Optimizing Dataproc templates and autoscaling
Optimizing Dataproc monitoring
Lab Intro: Running Apache Spark jobs on Dataproc
LAB: Running Apache Spark jobs on Cloud Dataproc: This lab focuses on running Apache Spark jobs on Cloud Dataproc.
Summary
QUIZ

Serverless Data Processing with Dataflow

This module covers using Dataflow to build your data processing pipelines

Module introduction
Introduction to Dataflow
Why customers value Dataflow
Building Dataflow pipelines in code
Key considerations with designing pipelines
Transforming data with PTransforms
Lab Intro: Building a Simple Dataflow Pipeline
LAB: A Simple Dataflow Pipeline (Python) 2.5: In this lab, you learn how to write a simple Dataflow pipeline and run it both locally and on the cloud.
LAB: Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Java): In this lab you will open a Dataflow project, use pipeline filtering, and execute the pipeline locally and on the cloud using Java.
Aggregate with GroupByKey and Combine
Lab Intro: MapReduce in Beam
LAB: MapReduce in Beam (Python) 2.5: In this lab, you learn how to use pipeline options and carry out Map and Reduce operations in Dataflow.
LAB: Serverless Data Analysis with Beam: MapReduce in Beam (Java): In this lab you will identify Map and Reduce operations, execute the pipeline, use command line parameters.
Side inputs and windows of data
Lab Intro: Practicing Pipeline Side Inputs
LAB: Serverless Data Analysis with Dataflow: Side Inputs (Python): In this lab you will try out a BigQuery query, explore the pipeline code, and execute the pipeline using Python.
LAB: Serverless Data Analysis with Dataflow: Side Inputs (Java): In this lab you will try out a BigQuery query, explore the pipeline code, and execute the pipeline using Java.
Creating and re-using pipeline templates
Summary
QUIZ

Manage Data Pipelines with Cloud Data Fusion and Cloud Composer

This module shows how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

Module introduction
Introduction to Cloud Data Fusion
Components of Cloud Data Fusion
Cloud Data Fusion UI
Build a pipeline
Explore data using wrangler
Lab Intro: Building and executing a pipeline graph in Cloud Data Fusion
LAB: Building and Executing a Pipeline Graph with Data Fusion 2.5: This tutorial shows you how to use the Wrangler and Data Pipeline features in Cloud Data Fusion to clean, transform, and process taxi trip data for further analysis.
Orchestrate work between Google Cloud services with Cloud Composer
Apache Airflow environment
DAGs and Operators
Workflow scheduling
Monitoring and Logging
Lab Intro: An Introduction to Cloud Composer
LAB: An Introduction to Cloud Composer 2.5: In this lab, you create a Cloud Composer environment using the GCP Console. You then use the Airflow web interface to run a workflow that verifies a data file, creates and runs an Apache Hadoop wordcount job on a Dataproc cluster, and deletes the cluster.
QUIZ

Why have you chosen us?

I have attended a training from Bilginc IT Academy before and I was satisfied.

I have attended a training from a different provider and it was not helpful.

Other

How many employees do you have in your IT department?

0 – 50

50 – 250

250 – 1000

1000+

Avaible Training Dates

Join our public courses in our New Zealand facilities. Private class trainings will be organized at the location of your preference, according to your schedule.

We can organize this training at your preferred date and location.

09 August 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

15 August 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

19 August 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

25 August 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

09 September 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

11 September 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

14 September 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

24 September 2025 (1 Day)
Auckland, Wellington, Christchurch
Classroom / Virtual Classroom

€1,365 +VAT

Building Batch Data Pipelines on Google Cloud Training in New Zealand

Training Outline

Avaible Training Dates