Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Module 1: Informatica Data Engineering Management Overview
- Foundations of Data Engineering
- Key features of Data Engineering Management
- Advantages of Data Engineering Management
- Architecture of Data Engineering Management
- Developer responsibilities in Data Engineering Management
- New features in Data Engineering Integration 10.4
Module 2: Ingestion and Extraction in Hadoop
- Integrating DEI with Hadoop clusters
- Understanding Hadoop file systems
- Data ingestion into HDFS and Hive using SQOOP
- Mass Ingestion to HDFS and Hive – Initial load
- Mass Ingestion to HDFS and Hive – Incremental load
- Lab: Configuring SQOOP to process data between Oracle and HDFS
- Lab: Configuring SQOOP for data processing between an Oracle database and Hive
- Lab: Developing Mapping Specifications using the Mass Ingestion Service
Module 3: Native and Hadoop Engine Strategy
- Engine strategy for Data Engineering Integration
- Architecture of the Hive Engine
- MapReduce
- Tez
- Spark architecture
- Blaze architecture
- Lab: Executing a mapping in Spark mode
- Lab: Connecting to a Deployed Application
Module 4: Data Engineering Development Process
- Advanced Transformations in Data Engineering Integration using Python and Update Strategy
- Hive ACID Use Case
- Stateful Computing and Windowing
- Lab: Building a Reusable Python Transformation
- Lab: Creating an Active Python Transformation
- Lab: Performing Hive Upserts
- Lab: Utilizing the Windowing Function LEAD
- Lab: Utilizing the Windowing Function LAG
- Lab: Creating a Macro Transformation
Module 5: Complex File Processing
- Data Engineering file formats – Avro, Parquet, JSON
- Complex file data types – Structs, Arrays, Maps
- Advanced configuration, operators, and functions
- Lab: Converting flat file data objects to an Avro file
- Lab: Using complex data types – Arrays, Structs, and Maps in a mapping
Module 6: Hierarchical Data Processing
- Hierarchical Data Processing
- Flattening Hierarchical Data
- Dynamic Flattening with Schema Changes
- Hierarchical Data Processing with Schema Changes
- Advanced configuration, operators, and functions
- Dynamic Ports
- Dynamic Input Rules
- Lab: Flattening a complex port in a Mapping
- Lab: Building dynamic mappings using dynamic ports
- Lab: Building dynamic mappings using input rules
- Lab: Performing Dynamic Flattening of complex ports
- Lab: Parsing Hierarchical Data on the Spark Engine
Module 7: Mapping Optimization and Performance Tuning
- Validation Environments
- Execution Environment
- Mapping Optimization
- Mapping Recommendations and Insight
- Scheduling, Queuing, and Node Labeling
- Mapping Audits
- Lab: Implementing Recommendations
- Lab: Implementing Insights
- Lab: Implementing Mapping Audits
Module 8: Monitoring Logs and Troubleshooting in Hadoop
- Hadoop Environment Logs
- Spark Engine Monitoring
- Blaze Engine Monitoring
- REST Operations Hub
- Log Aggregator
- Troubleshooting
- Lab: Monitoring Mappings using REST Operations Hub
- Lab: Viewing and analyzing logs using Log Aggregator
Module 9: Intelligent Structure Model
- Overview of Intelligent Structure Discovery
- Intelligent Structure Model
- Lab: Using an Intelligent Structure Model in a Mapping
Module 10: Databricks Overview
- Databricks overview
- Configuration steps for Databricks
- Databricks clusters
- Notebooks, Jobs, and Data
- Delta Lakes
Module 11: Databricks Integration
- Databricks Integration
- Components of the Informatica and Databricks environments
- Runtime process on the Databricks Spark Engine
- Databricks Integration Task Flow
- Prerequisites for Databricks integration
- Cluster Workflows
- Demo: Setting up a Databricks connection
- Demo: Running a mapping with the Databricks Spark engine
Requirements
Developer Tool for Big Data Developers
21 Hours
Testimonials (1)
It's a hands-on session.