Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps
- Defining AIOps and its significance
- Comparing traditional monitoring with AIOps-driven observability
- Overview of AIOps architecture and essential components
Collecting and Normalizing Operational Data
- Categories of observability data: metrics, logs, and traces
- Ingesting data from diverse sources (servers, containers, cloud)
- Leveraging agents and exporters (Prometheus, Beats, Fluentd)
Data Correlation and Anomaly Detection
- Time series correlation and statistical approaches
- Applying ML models for anomaly detection
- Identifying incidents across distributed systems
Alerting and Noise Reduction
- Crafting intelligent alert rules and setting thresholds
- Techniques for suppression, deduplication, and alert grouping
- Integrating with platforms like Alertmanager, Slack, PagerDuty, or Opsgenie
Root Cause Analysis and Visualization
- Utilizing dashboards to visualize metrics and identify trends
- Investigating events and timelines for RCA
- Tracing issues across layers using distributed tracing tools
Automation and Remediation
- Initiating automated scripts or workflows triggered by incidents
- Connecting with ITSM systems (ServiceNow, Jira)
- Use cases: self-healing, scaling, and traffic rerouting
Open Source and Commercial AIOps Platforms
- Overview of tools: Prometheus, Grafana, ELK, Moogsoft, Dynatrace
- Criteria for selecting the right AIOps platform
- Demo and hands-on practice with a chosen stack
Summary and Next Steps
Requirements
- A solid understanding of IT operations and system monitoring concepts
- Practical experience with monitoring tools or dashboards
- Familiarity with fundamental log and metric formats
Audience
- Operations teams managing infrastructure and applications
- Site Reliability Engineers (SREs)
- IT monitoring and observability teams
14 Hours