Spark is at the forefront of distributed computing. This module is taught using Python and includes the basics of critical functional programming constructs such as map, flatmap, list comprehensions, and data structures.
We cover core concepts of Spark-like resilient distributed datasets, memory caching, actions, transformations, tuning, and optimization. Students get to build functioning applications from end to end and learn critical tooling to enhance productivity. They apply that knowledge directly to developing, building, and deploying Spark jobs that they will run on large, real-world datasets in the cloud (AWS and Google Cloud Platform).