Chaos Engineering Workshop With Gremlin
Quickly learn how to make your systems more reliable and fault-tolerant by utilizing proven chaos engineering techniques.
The principles of chaos engineering showcase the importance of being prepared, identifying weaknesses before they become a problem, and approaching system failures successfully. However, there's a journey to improve the system's resiliency, and running chaos experiments in production right from the beginning isn't a good approach.
In this chaos engineering course, students learn how to inject chaos into their systems in a controlled way. This course is intended for engineers who want to learn about the fundamentals of chaos engineering and gain hands-on experience breaking things in a safe environment. We'll explore which practices, platforms, and tools allow you to run chaos experiments in your organization. Problems arise whether your infrastructure runs on-premises, in a cloud environment, or in containers. This boot camp shows you typical problems you find at different layers and how to build more resilient systems.
Available formats for this course
Duration2 days/16 hours of instruction
Public Classroom Pricing
GSA Price: $1423.5
Group Rate: $1750
Part 1: Introduction To Chaos Engineering
- Common Failure Scenarios
- What Is Chaos Engineering?
- What Is Not Chaos Engineering?
- Why Is Chaos Engineering Different From Testing?
- Why Run Chaos Experiments in Production?
- Phases of Chaos Engineering
- Principles of Chaos Engineering
- Exercise: Breaking a System
- The Importance of Having a Good Monitoring Strategy
- Prerequisites for Chaos Engineering
- Planning Your First Experiment
- Limitations of Chaos Engineering
- Exercise: Planning Your First Experiment
- Exercise: Injecting a CPU Chaos
- Discussion: Does Everyone Need to Do Chaos Engineering?
Part 2: Chaos Engineering in Practice
- Chaos Engineering: It's Not Just for Netflix
- How Do Game Days Help to Run Chaos Experiments?
- How to Get Management Buy-In for Chaos Engineering
- Human Factors and Team Collaboration
- Case Studies of Outages at Big Companies
- Discussion: How Would You Put Chaos Engineering Into Practice?
- Chaos Engineering Best Practices
- Tools for Designing and Running Chaos Experiments
- Introduction to the Chaos Toolkit
- Exercise: Running a Chaos Experiment With the Chaos Toolkit
Part 3: Monitoring and Metrics
- Why Is Monitoring Critical for Chaos Engineering?
- How to Collect Events From Your Systems?
- What Are Some Basic Metrics You Must Collect?
- Instrumenting Your Applications
- Exercise: Instrumenting an Application
- Exercise: Exploring AWS CloudWatch and X-Ray
- Exercise: Monitoring Experiments With Sensu
Part 4: Introduction To Gremlin
- Chaos Engineering as a Platform
- Exploring Different Types of Attacks
- Injecting Chaos Into Your Infrastructure
- Injecting Chaos Into Your Applications
- Exercise: Running Your First Experiment in a VM
- What Are the Scenarios in Gremlin?
- Types of Integrations Within Gremlin
Part 5: Injecting Infrastructure Chaos
- Typical Problems You Can Find in an Infrastructure
- Remediations for Solving Infrastructure Incidents
- Architecting and Designing for Chaos Engineering
- Practices to Improve Resiliency: Canary Releases, Feature Flags, Circuit Breakers, etc.
- Applying Chaos Engineering Principles
- Exercise: Injecting CPU Exhaustion
- Exercise: Injecting Network Disruption
- Exercise: Injecting Latency for a Squid Proxy
Part 6: Chaos Engineering To Databases
- Why Chaos Engineering for Databases?
- Applying Chaos Engineering to Databases
- Detecting Chaos in Databases
- Automating Chaos Injection
- Exercise: Breaking an Application That Uses DynamoDB
- Exercise: Breaking an Application That Uses Cassandra
- Exercise: Breaking an Application That Uses Redis
Part 7: Chaos Engineering in Containers
- Problems With a Distributed Containerized Application
- Remediations for Solving Incidents in Kubernetes
- Applying Chaos Engineering Principles in Kubernetes and Containers
- Exercise: Install Gremlin in Kubernetes
- Exercise: Running Chaos Experiments for Memcached in Kubernetes
Part 8: Automating Chaos Experiments
- Anatomy of Continuous Integration and Continuous Delivery (CI/CD) Pipelines
- How to Run Chaos Experiments Continuously?
- The Toolset That Helps to Shift Chaos to the Left
- Creating and Running Experiments Automatically
- Exercise: Automating Chaos With Terraform
- Exercise: Running Chaos Experiments Using Jenkins
Part 9: What's Next in Chaos Engineering?
- The Future of Chaos Engineering Principles
- The Chaos Maturity Model
- Making the Business Case for Chaos Engineering
- Discussion: How Would You Start With Chaos Engineering at Your Company?
This chaos engineering workshop is perfect for anyone who wants to run chaos experiments in their systems using Gremlin. Prior experience with IT infrastructure and software engineering is highly recommended before attending this workshop.
Professionals who benefit from this course include:
· Software Engineers
· Site Reliability Engineers
· Systems Engineers
· Network Engineers
· Anyone involved with DevOps-style workflows
· Anyone involved with IT infrastructure
- Use Gremlin as a platform for running chaos experiments
- Build resilient distributed systems
- Build and run chaos engineering experiments at your company
- Monitor experiments with Sensu
- Break applications that use DynamoDB, Cassandra, or Redis
- Run chaos experiments for Memcached in Kubernetes
- Run chaos experiments using Jenkins
- Automate chaos with Terraform