Pentaho Data Integration Fundamentals Training Course
Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.
In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle and maximize the value of data within their organization.
By the end of this training, participants will be able to:
- Create, preview, and run basic data transformations containing steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
- Provide results to third-part applications for further processing
Audience
- Data Analyst
- ETL developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
Installing and Configuring Pentaho
Overview of Pentaho Features and Architecture
Understanding Pentaho's In-Memory Caching
Navigating the User Interface
Connecting to a Data Source
Configuring the Pentaho Enterprise Repository
Transforming Data
Viewing the Transformation Results
Resolving Transformation Errors
Processing a Data Stream
Reusing Transformations
Scheduling Transformations
Securing Pentaho
Integrating with Third-party Applications (Hadoop, NoSQL, etc.)
Analytics and Reporting
Pentaho Design Patterns and Best Practices
Troubleshooting
Summary and Conclusion
Requirements
- An understanding of relational databases
- An understanding of data warehousing
- An understanding of ETL (Extract, Transform, Load) concepts
Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793
Pentaho Data Integration Fundamentals Training Course - Enquiry
Testimonials (1)
It's a hands-on session.
Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)
Course - Talend Open Studio for ESB
Related Courses
Data Engineering Integration for Developers
21 HoursThis course is tailored for software professionals at level 10.5. It equips learners with the skills to accelerate Data Engineering Integration through techniques such as large-scale data ingestion, incremental loading, transformations, complex file processing, dynamic mappings, and Python scripting. The curriculum explores how to leverage application logic for Data Engineering scenarios while emphasizing monitoring, troubleshooting, and adherence to best practices.
Objectives
Upon successful completion of this course, participants will be able to:
- Ingest large volumes of data into Hive and HDFS
- Execute incremental loads within Mass Ingestion
- Conduct both initial and incremental data loads
- Establish integration with relational databases using SQOOP
- Apply transformations across multiple computing engines
- Run mappings via JDBC in Spark mode
- Implement stateful computing and windowing techniques
- Handle complex file structures
- Analyze hierarchical data using the Spark engine
- Generate profiles and select sampling options on the Spark engine
- Deploy Dynamic Mappings
- Set up Audits for Mappings
- Track logs via REST Operations Hub
- Monitor logs through Log Aggregation and perform troubleshooting
- Execute mappings within the Databricks environment
- Develop mappings to access Delta Lake tables
- Optimize the performance of Spark and Databricks jobs
KNIME Analytics Platform - Comprehensive Training
35 HoursThe "KNIME Analytics Platform" training offers a comprehensive overview of this free data analysis platform. The curriculum covers data processing and analysis introduction, installation and configuration of KNIME, building workflows, business model development methodologies, and data modeling. The course also discusses advanced data analysis tools, workflow import and export, tool integration, ETL processes, data mining, visualization, extensions, and integrations with tools such as R, Java, Python, Gephi, and Neo4j. The conclusion includes reporting, integration with BIRT, and KNIME WebPortal.
Oracle GoldenGate
14 HoursThis instructor-led, live training in Kenya (online or onsite) is designed for system administrators and developers who aim to set up, deploy, and manage Oracle GoldenGate for data transformation.
Upon completing this training, participants will be able to:
- Install and configure Oracle GoldenGate.
- Comprehend Oracle database replication using the Oracle GoldenGate tool.
- Understand the architecture of Oracle GoldenGate.
- Configure and execute database replication and migration tasks.
- Enhance Oracle GoldenGate performance and resolve technical issues.
Pentaho Open Source BI Suite Community Edition (CE)
28 HoursThe Pentaho Open Source BI Suite Community Edition (CE) is a comprehensive business intelligence solution offering data integration, reporting, dashboarding, and loading capabilities.
Through this instructor-led live training, attendees will explore ways to fully leverage the capabilities of the Pentaho Open Source BI Suite Community Edition (CE).
Upon completing this training, participants will be equipped to:
- Install and configure the Pentaho Open Source BI Suite Community Edition (CE)
- Grasp the core concepts of Pentaho CE tools and their functionalities
- Create reports utilizing Pentaho CE
- Incorporate third-party data sources into Pentaho CE
- Handle big data and analytics tasks within Pentaho CE
Audience
- Developers
- BI Developers
Course Format
- A blend of lectures, discussions, exercises, and extensive hands-on practice
Note
- For customized training requests, please reach out to us to make arrangements.
Pentaho Data Integration Advanced
21 HoursPentaho Data Integration serves as a robust platform for architecting enterprise-grade ETL processes and data pipelines.
This instructor-led live training, available either online or onsite, targets experienced engineers aiming to master the creation of high-performance, enterprise-scale, and heavily automated PDI solutions.
Upon completing this course, participants will be prepared to:
- Architect large-scale ETL pipelines utilizing advanced orchestration techniques.
- Optimize complex transformations to ensure peak performance.
- Implement hybrid integration patterns alongside scripting and automation.
- Design robust, maintainable workflows ready for production deployment.
Course Format
- Architectural discussions and demonstrations led by experts.
- In-depth lab exercises addressing advanced, real-world ETL challenges.
- Practical development within a production-like environment.
Course Customization Options
- Please reach out to us if you need a customized version of this training.
Pentaho Data Integration Intermediate
21 HoursPentaho Data Integration serves as a robust platform for extracting, transforming, and loading data.
This instructor-led live training, available either online or onsite, is designed for intermediate practitioners looking to elevate their Pentaho Data Integration (PDI) capabilities to handle more complex transformation scenarios.
Upon completion of this training, participants will be equipped to:
- Design multi-step transformations that offer improved performance.
- Effectively work with variables, parameters, and reusable components.
- Integrate PDI with databases, APIs, and external systems.
- Implement best practices for creating maintainable and scalable ETL pipelines.
Course Format
- Interactive demonstrations alongside instructor-led explanations.
- Guided exercises and scenario-based practical sessions.
- Hands-on experience within a real-world ETL project environment.
Customization Options
- Should you require a tailored version of this course, please reach out to us to arrange customization.
Talend Administration Center (TAC)
14 HoursThis instructor-led live training, held in Kenya (online or onsite), is designed for system administrators, data scientists, and business analysts who wish to set up Talend Administration Center to deploy and manage organizational roles and tasks.
Upon completing this training, participants will be capable of:
- Installing and configuring Talend Administration Center
- Gaining a thorough understanding of and implementing core Talend management principles
- Creating, deploying, and executing business projects or tasks within Talend
- Monitoring dataset security and establishing business routines aligned with the TAC framework
- Developing a comprehensive understanding of big data applications
Talend Big Data Integration
28 HoursThis instructor-led live training in Kenya (online or onsite) is aimed at technical persons who wish to deploy Talend Open Studio for Big Data to simplify the process of reading and crunching through Big Data.
By the end of this training, participants will be able to:
- Install and configure Talend Open Studio for Big Data.
- Connect with big data systems such as Cloudera, HortonWorks, MapR, Amazon EMR, and Apache.
- Understand and set up Open Studio's big data components and connectors.
- Configure parameters to automatically generate MapReduce code.
- Use Open Studio's drag-and-drop interface to run Hadoop jobs.
- Prototype big data pipelines.
- Automate big data integration projects.
Talend Data Stewardship
14 HoursThis instructor-led live training, conducted either online or onsite, is intended for beginner to intermediate data analysts seeking to deepen their skills and understanding in managing and improving data quality via Talend Data Stewardship.
By the end of this course, participants will be able to:
- Understand the critical role of data stewardship in preserving data quality.
- Apply Talend Data Stewardship to manage data quality tasks.
- Create, assign, and manage tasks within Talend Data Stewardship, including customizing workflows.
- Use reporting and monitoring tools to assess data quality and stewardship outcomes.
Talend Open Studio for ESB
21 HoursIn this instructor-led live training held in Kenya, participants will learn how to use Talend Open Studio for ESB to create, connect, mediate and manage services and their interactions.
By the end of this training, participants will be able to
- Integrate, enhance and deliver ESB technologies as single packages in a variety of deployment environments.
- Understand and utilize Talend Open Studio's most used components.
- Integrate any application, database, API, or Web services.
- Seamlessly integrate heterogeneous systems and applications.
- Embed existing Java code libraries to extend projects.
- Leverage community components and code to extend projects.
- Rapidly integrate systems, applications and data sources within a drag-and-drop Eclipse environment.
- Reduce development time and maintenance costs by generating optimized, reusable code.