Data Engineering Integration for Developers Training Course

This course is tailored for software professionals at level 10.5. It equips learners with the skills to accelerate Data Engineering Integration through techniques such as large-scale data ingestion, incremental loading, transformations, complex file processing, dynamic mappings, and Python scripting. The curriculum explores how to leverage application logic for Data Engineering scenarios while emphasizing monitoring, troubleshooting, and adherence to best practices.

Objectives

Upon successful completion of this course, participants will be able to:

Ingest large volumes of data into Hive and HDFS
Execute incremental loads within Mass Ingestion
Conduct both initial and incremental data loads
Establish integration with relational databases using SQOOP
Apply transformations across multiple computing engines
Run mappings via JDBC in Spark mode
Implement stateful computing and windowing techniques
Handle complex file structures
Analyze hierarchical data using the Spark engine
Generate profiles and select sampling options on the Spark engine
Deploy Dynamic Mappings
Set up Audits for Mappings
Track logs via REST Operations Hub
Monitor logs through Log Aggregation and perform troubleshooting
Execute mappings within the Databricks environment
Develop mappings to access Delta Lake tables
Optimize the performance of Spark and Databricks jobs

This course is available as onsite live training in Kenya or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Module 1: Informatica Data Engineering Management Overview

Foundations of Data Engineering
Key features of Data Engineering Management
Advantages of Data Engineering Management
Architecture of Data Engineering Management
Developer responsibilities in Data Engineering Management
New features in Data Engineering Integration 10.4

Module 2: Ingestion and Extraction in Hadoop

Integrating DEI with Hadoop clusters
Understanding Hadoop file systems
Data ingestion into HDFS and Hive using SQOOP
Mass Ingestion to HDFS and Hive – Initial load
Mass Ingestion to HDFS and Hive – Incremental load
Lab: Configuring SQOOP to process data between Oracle and HDFS
Lab: Configuring SQOOP for data processing between an Oracle database and Hive
Lab: Developing Mapping Specifications using the Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

Engine strategy for Data Engineering Integration
Architecture of the Hive Engine
MapReduce
Tez
Spark architecture
Blaze architecture
Lab: Executing a mapping in Spark mode
Lab: Connecting to a Deployed Application

Module 4: Data Engineering Development Process

Advanced Transformations in Data Engineering Integration using Python and Update Strategy
Hive ACID Use Case
Stateful Computing and Windowing
Lab: Building a Reusable Python Transformation
Lab: Creating an Active Python Transformation
Lab: Performing Hive Upserts
Lab: Utilizing the Windowing Function LEAD
Lab: Utilizing the Windowing Function LAG
Lab: Creating a Macro Transformation

Module 5: Complex File Processing

Data Engineering file formats – Avro, Parquet, JSON
Complex file data types – Structs, Arrays, Maps
Advanced configuration, operators, and functions
Lab: Converting flat file data objects to an Avro file
Lab: Using complex data types – Arrays, Structs, and Maps in a mapping

Module 6: Hierarchical Data Processing

Hierarchical Data Processing
Flattening Hierarchical Data
Dynamic Flattening with Schema Changes
Hierarchical Data Processing with Schema Changes
Advanced configuration, operators, and functions
Dynamic Ports
Dynamic Input Rules
Lab: Flattening a complex port in a Mapping
Lab: Building dynamic mappings using dynamic ports
Lab: Building dynamic mappings using input rules
Lab: Performing Dynamic Flattening of complex ports
Lab: Parsing Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization and Performance Tuning

Validation Environments
Execution Environment
Mapping Optimization
Mapping Recommendations and Insight
Scheduling, Queuing, and Node Labeling
Mapping Audits
Lab: Implementing Recommendations
Lab: Implementing Insights
Lab: Implementing Mapping Audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Hadoop Environment Logs
Spark Engine Monitoring
Blaze Engine Monitoring
REST Operations Hub
Log Aggregator
Troubleshooting
Lab: Monitoring Mappings using REST Operations Hub
Lab: Viewing and analyzing logs using Log Aggregator

Module 9: Intelligent Structure Model

Overview of Intelligent Structure Discovery
Intelligent Structure Model
Lab: Using an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

Databricks overview
Configuration steps for Databricks
Databricks clusters
Notebooks, Jobs, and Data
Delta Lakes

Module 11: Databricks Integration

Databricks Integration
Components of the Informatica and Databricks environments
Runtime process on the Databricks Spark Engine
Databricks Integration Task Flow
Prerequisites for Databricks integration
Cluster Workflows
Demo: Setting up a Databricks connection
Demo: Running a mapping with the Databricks Spark engine

Requirements

Developer Tool for Big Data Developers

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Data Engineering Integration for Developers Training Course

Objectives

Course Outline

Module 1: Informatica Data Engineering Management Overview

Module 2: Ingestion and Extraction in Hadoop

Module 3: Native and Hadoop Engine Strategy

Module 4: Data Engineering Development Process

Module 5: Complex File Processing

Module 6: Hierarchical Data Processing

Module 7: Mapping Optimization and Performance Tuning

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Module 9: Intelligent Structure Model

Module 10: Databricks Overview

Module 11: Databricks Integration

Requirements

Testimonials (1)

Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)

Course - Talend Open Studio for ESB

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Data Engineering Integration for Developers Training Course

Objectives

Course Outline

Module 1: Informatica Data Engineering Management Overview

Module 2: Ingestion and Extraction in Hadoop

Module 3: Native and Hadoop Engine Strategy

Module 4: Data Engineering Development Process

Module 5: Complex File Processing

Module 6: Hierarchical Data Processing

Module 7: Mapping Optimization and Performance Tuning

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Module 9: Intelligent Structure Model

Module 10: Databricks Overview

Module 11: Databricks Integration

Requirements

Testimonials (1)

Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)

Course - Talend Open Studio for ESB

Related Courses

KNIME Analytics Platform - Comprehensive Training

Oracle GoldenGate

Pentaho Open Source BI Suite Community Edition (CE)

Pentaho Data Integration Fundamentals

Pentaho Data Integration Advanced

Pentaho Data Integration Intermediate

Talend Administration Center (TAC)

Talend Big Data Integration

Talend Data Stewardship

Talend Open Studio for ESB

Related Categories

Data Integration

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites