This hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark (including Spark Streaming and Spark SQL), Flume, Kafka, and Sqoop, this training course is the best preparation for the real-world challenges faced by Hadoop developers. With Spark, developers can write sophisticated parallel applications to execute faster decisions, better decisions, and interactive actions, applied to a wide variety of use cases, architectures, and industries.
Introduction
Introduction to Apache Hadoop and the Hadoop Ecosystem
Large-Scale Systems
Data Processing on an Apache Hadoop Cluster
Importing Relational Data with Apache Sqoop
Apache Spark Basics
Working with RDDs
Aggregating Data with Pair RDDs
Writing and Running Apache Spark Applications
(Scala and Java)
Configuring Apache Spark Applications
Parallel Processing in Apache Spark
RDD Persistence
Common Patterns in Apache Spark
Data Processing
DataFrames and Spark SQL
Message Processing with Apache Kafka
Capturing Data with Apache Flume
Integrating Apache Flume and Apache Kafka
Apache Spark Streaming:
Introduction to DStreams
Apache Spark Streaming:
Processing Multiple Batches
Apache Spark Streaming: Data Sources
Data Sources
Conclusion
Join our public courses in our Istanbul, London and Ankara facilities. Private class trainings will be organized at the location of your preference, according to your schedule.