Atlassian OpsGenie – Helping DevOps Teams Stay in Control (Part 2)

In Part One, we took a high level view of one of the most common challenges facing DevOps teams: “alert noise”.

Basically, with all the inputs coming from the IT stack (which is constantly growing in size and complexity), and an added layer of monitoring solutions that may or may not be coordinated, many teams are getting barraged by an incessant flood of alerts. These alerts often lack context or prioritization, making it difficult to effectively prioritize. These alerts often either going to too many people or not enough. Either way, they’re not guaranteed to be going to the right person at the right time, which is really the key to effective alert management and efficient resolution of service disruptions and other key DevOps responsibilities.

In Part Two, we’re going to dive deeper into the Atlassian OpsGenie application and dig into how it addresses these common issues to offer a true solution to “alert noise” and other DevOps challenges.

The core components of the Atlassian OpsGenie solution

To begin with, let’s take a quick look at the core components that make up Atlassian OpsGenie:

Users
Teams Configuration
On-call Schedules
Escalation Policies
Routing Rules
Integrations
Services
Alerts
Reporting

Users, of course, are the individuals who make up the larger DevOps team. They can also be individuals outside DevOps who need to be aware of certain issues. Within Atlassian OpsGenie, users can be thoroughly managed via unique profiles, granular permissions, and loads of personalizations that make the tool work for each team member, rather than the other way around.

Importantly, each user can control, within their personal profile, how they receive alerts, which types of alerts they receive, and what they can do with them once they arrive.

Teams are made up of individual users. When configuring teams within Atlassain OpsGenie, it’s generally going to mirror your organizational teams, but you’ll want to give some forethought to any exceptions or adjustments that make sense from an alert management standpoint. That’s because the on-call scheduling, routing rules, and escalation policies are all managed at the team level.

Solution: Getting an alert to the right person at the right time

As you might expect, an on-call schedule reflects which team members (users) are responsible for handling alert responses at a given time. Alternatively, an escalation policy sets rules around where the alert should be sent first, then if and when it needs to be sent elsewhere based on the response to the initial alert. Finally, a routing rule tells the system which set of escalation policies and/or on-call schedules to adhere to based on what sort of alert is coming in, what day/time it is, etc.

Once these rules are set, the alert management system is fully automated. Clearly, the problem of ensuring the right person receives an alert at the right time has been solved. But, what about the problem of alerts lacking context and prioritization? That’s where OpsGenie’s integrations and services come in.

Solution: Adding context to alerts

OpsGenie comes with over 200 pre-loaded integrations with numerous input and output applications, and more are being added regularly. There’s also an API, so you can create your own integrations relatively easily. Basically, an integration means OpsGenie can talk to this other application and can either receive data from that application or feed data to it. In many cases, the data flow goes both ways.

For every integration you choose to activate, you’re provided with a host of possible actions OpsGenie can take with the data it receives and/or hands out. And, importantly, it’s in these various actions that you gain the power to add context to every alert you receive.

Solution: Prioritizing alerts and building a duplicatable system

OpsGenie Services are a higher-level form of automation you can put in place once you’ve worked with the system for some time and have established a successful combination of integration actions and alert management rules. Services tie all these together:

a specific input (or input type)
a protocol the system will follow
a list of users and their predefined responsibilities in relation to that input
a list of stakeholders who have no immediate responsibility, but should be aware of the situation/outcome
a selection of reports to be generated and distributed upon completion

So, for example, if a given alert has proven to be the harbinger of a high-priority incident the last ten times you saw it, it makes sense to create a Service around that alert. Automate the protocol, notifications, and reporting. Then, as your list of established Services grows, the number of manually triaged issues flowing in gets smaller and smaller.

Which brings us to the bottom line: OpsGenie solves all the biggest alert management issues facing today’s DevOps teams. Think back to the last downtime snafu you had to scramble to resolve. Now, think about how much faster and easier it would have been with Atlassian OpsGenie onboard, skyrocketing your team’s efficiency and resolution speed.

Explore Opsgenie for yourself

Get a Demo

Maxwell Traers

Technical Content Contributor, Cprime

Enterprise Agility Need to respond to change faster? To do more with less? To surpass your competition? Adopting a holistic approach to change and continuous improvement across the organization can achieve all that and more Learn more >

Global TalentElevate your pool of talent to beat the global tech talent shortage and remain competitive in the marketplace with end-to-end solutions for enhancing your tech teams Learn more >

Development Support lean, cost-effective workflows focused on delivering value to your customer by leveraging individual specialists or entire teams of experienced software engineers to build custom applications and integrations Learn more >

Business Technology Establish the optimal tool stacks to streamline workflows, data capture, and transparency across the organization, supporting decision making and agility Learn more >

Training From new ways of working to deeply technical tools-based topics, leverage 30 years of experience to bridge skills gaps, empower excellence, and foster innovation for unmatched growth. Cprime Learning >

Pages

Courses

Resources

Blogs

Atlassian OpsGenie – Helping DevOps Teams Stay in Control (Part 2)

Solution: Getting an alert to the right person at the right time

Solution: Adding context to alerts

Solution: Prioritizing alerts and building a duplicatable system

Explore Opsgenie for yourself

Maxwell Traers

Enterprise Agility Need to respond to change faster? To do more with less? To surpass your competition? Adopting a holistic approach to change and continuous improvement across the organization can achieve all that and more Learn more >

Global TalentElevate your pool of talent to beat the global tech talent shortage and remain competitive in the marketplace with end-to-end solutions for enhancing your tech teams Learn more >

Development Support lean, cost-effective workflows focused on delivering value to your customer by leveraging individual specialists or entire teams of experienced software engineers to build custom applications and integrations Learn more >

Business Technology Establish the optimal tool stacks to streamline workflows, data capture, and transparency across the organization, supporting decision making and agility Learn more >

Training From new ways of working to deeply technical tools-based topics, leverage 30 years of experience to bridge skills gaps, empower excellence, and foster innovation for unmatched growth. Cprime Learning >

Pages

Courses

Resources

Blogs

Atlassian OpsGenie – Helping DevOps Teams Stay in Control (Part 2)

Solution: Getting an alert to the right person at the right time

Solution: Adding context to alerts

Solution: Prioritizing alerts and building a duplicatable system

Explore Opsgenie for yourself

Maxwell Traers

You may also be interested in:

Solution in Action: Accelerating Atlassian Cloud Migrations with AI + Cprime Expertise

Solution in Action: Platform Engineering Evolved with AI

Atlassian’s Bold Move to AI-Native: What Leaders Need to Know