Apache Spark in the Cloud Training Course

The initial learning curve for Apache Spark can be steep, requiring significant effort before yielding results. This course is designed to help you navigate that challenging first phase. Upon completion, participants will grasp the fundamentals of Apache Spark, clearly distinguish between RDDs and DataFrames, and become proficient with the Python and Scala APIs. You will also gain a solid understanding of executors, tasks, and other core concepts. Guided by best practices, the course places a strong emphasis on cloud deployment, with dedicated focus on Databricks and AWS. Additionally, students will explore the differences between AWS EMR and AWS Glue, highlighting one of AWS's latest Spark services.

AUDIENCE:

Data Engineers, DevOps Professionals, Data Scientists

This course is available as onsite live training in Kenya or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction:

Apache Spark within the Hadoop Ecosystem
Brief overview of Python and Scala

Foundational Concepts (Theory):

Architecture
Resilient Distributed Datasets (RDDs)
Transformations and Actions
Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering the Basics in the Databricks Environment

Exercises utilizing the RDD API
Core action and transformation functions
PairRDDs
Join operations
Caching strategies
Exercises utilizing the DataFrame API
SparkSQL
DataFrame operations: select, filter, group, sort
User-Defined Functions (UDFs)
Exploration of the Dataset API
Streaming

Hands-on Workshop: Deployment in the AWS Environment

Fundamentals of AWS Glue
Comparing AWS EMR and AWS Glue
Practical job examples in both environments
Analysis of advantages and disadvantages

Additional Content:

Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python and Scala)

Foundational knowledge of SQL

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (3)

Having hands on session / assignments

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise

Apache Spark in the Cloud Training Course

Course Outline

Requirements

Testimonials (3)

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Apache Spark in the Cloud Training Course

Course Outline

Requirements

Testimonials (3)

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Related Courses

Big Data Analytics with Google Colab and Apache Spark

PySpark and Machine Learning

Apache Spark Fundamentals

Administration of Apache Spark

Python and Spark for Big Data (PySpark)

Python, Spark, and Hadoop for Big Data

Stratio: Rocket and Intelligence Modules with PySpark

Related Categories

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites