Hadoop For Administrators Training Course

Apache Hadoop stands as the leading framework for processing Big Data across server clusters. Over the course of three (or optionally four) days, participants will explore the business advantages and practical applications of Hadoop and its ecosystem. The curriculum covers cluster deployment planning, scaling strategies, as well as installation, maintenance, monitoring, troubleshooting, and optimization techniques. Attendees will engage in hands-on practice with bulk data loading, become acquainted with various Hadoop distributions, and learn to install and manage ecosystem tools. The course concludes with a discussion on securing clusters using Kerberos.

“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Target Audience

Hadoop system administrators

Course Format

A blend of lectures and practical labs, with an approximate split of 60% lectures and 40% hands-on labs.

This course is available as onsite live training in Kenya or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction
- History and core concepts of Hadoop
- The Hadoop ecosystem
- Available distributions
- High-level architecture overview
- Common myths surrounding Hadoop
- Challenges associated with Hadoop (hardware and software)
- Labs: Discussion of your Big Data projects and challenges
Planning and Installation
- Choosing software and Hadoop distributions
- Cluster sizing and growth planning
- Selecting appropriate hardware and network infrastructure
- Understanding rack topology
- Installation procedures
- Implementing multi-tenancy
- Directory structures and log management
- Benchmarking techniques
- Labs: Installing a cluster and running performance benchmarks
HDFS Operations
- Core concepts: horizontal scaling, replication, data locality, and rack awareness
- Nodes and daemons: NameNode, Secondary NameNode, HA Standby NameNode, DataNode
- Health monitoring strategies
- Administration via command-line and browser interfaces
- Adding storage capacity and replacing defective drives
- Labs: Familiarizing oneself with HDFS command lines
Data Ingestion
- Using Flume for log collection and other data ingestion into HDFS
- Utilizing Sqoop for importing data from SQL databases to HDFS, and exporting back to SQL
- Data warehousing with Hive
- Transferring data between clusters using distcp
- Leveraging S3 as a complement to HDFS
- Best practices and architectures for data ingestion
- Labs: Setting up and utilizing Flume and Sqoop
MapReduce Operations and Administration
- Parallel computing prior to MapReduce: Comparing HPC vs. Hadoop administration
- Managing MapReduce cluster loads
- Nodes and Daemons: JobTracker and TaskTracker
- Walkthrough of the MapReduce user interface
- MapReduce configuration
- Job configuration settings
- Optimizing MapReduce performance
- Preventing MapReduce errors: Guidelines for programmers
- Labs: Executing MapReduce examples
YARN: New Architecture and Capabilities
- YARN design objectives and implementation architecture
- Key components: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling within YARN
- Labs: Investigating job scheduling mechanisms
Advanced Topics
- Hardware monitoring
- Cluster monitoring techniques
- Adding and removing servers, and upgrading Hadoop versions
- Backup, recovery, and business continuity planning
- Oozie job workflows
- Hadoop High Availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: Setting up monitoring systems
Optional Tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are conducted within the Cloudera distribution environment (CDH5)
- Ambari for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)

Requirements

Proficiency in basic Linux system administration
Basic scripting capabilities

Prior knowledge of Hadoop and Distributed Computing is not mandatory, as these concepts will be introduced and explained throughout the course.

Lab Environment Setup

Zero Installation Required: Students are not required to install Hadoop software on their own machines. A functional Hadoop cluster will be provided for use.

Participants will need the following tools:

An SSH client (Linux and Mac systems come with built-in SSH clients; PuTTY is recommended for Windows users)
A web browser to access the cluster. We recommend using Firefox with the FoxyProxy extension installed

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (1)

Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already

Hadoop For Administrators Training Course

Target Audience

Course Format

Course Outline

Requirements

Lab Environment Setup

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop For Administrators Training Course

Target Audience

Course Format

Course Outline

Requirements

Lab Environment Setup

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Courses

Infomatica with Big Data (BDM)

Apache NiFi for Administrators

Apache NiFi for Developers

Python, Spark, and Hadoop for Big Data

Related Categories

Hadoop

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites