We’re pleased to announce that Splunk Infrastructure Monitoring now integrates with Opsgenie.
Monitoring modern systems requires DevOps teams to collect, aggregate, and alert on data from hundreds to thousands of different services in real time. Splunk Infrastructure Monitoring customers with all kinds of digital business – limited product releases, real-time payments, or global satellite image processing – have expressed to us that low-latency visualizations and alerting are critical to empower teams to deploy with confidence and shorten incidents when they occur. Splunk Infrastructure Monitoring's streaming architecture solves that problem for them by delivering alerts within seconds of receiving data.
To realize the potential of real-time alerts, you need a way to quickly route them to the right people and coordinate incident response. Splunk Infrastructure Monitoring integrates with leading notification and incident management services for exactly this purpose. Our integration with Opsgenie provides a unique, best-in-class solution for cloud monitoring and incident management that works in real time, at any scale.
In the days of monolithic architectures, a central Operations team would be responsible for keeping the product up and running. A team like this typically would have sole responsibility for setting up monitoring, paging the right people, and updating alerts as new components came online. But in the era of DevOps and microservices, the responsibility for keeping a service up and running rests with the development team who built the feature: you built it, you own it.
This doesn’t mean that each DevOps team should purchase their own on-call paging tools. For tasks like on-call paging that every team needs, it makes sense to rely on centralized solutions for the company as a whole. But changes to those centralized solutions still need to execute as fast as the self-organizing team that requires them. As a company scales, the ability to maintain self-service on common operational tasks is critical to sustaining the high velocity of change that is enabled by a microservices environment. When a team is planning to deploy a new microservice, they need to be able to manage the end-to-end process of monitoring — instrumenting metrics, tracing, monitoring and alert routing — themselves.
Fully centralized solutions can struggle to keep up with this process, because they need to reflect organizational changes that in some cases have already been acted on. If someone needs to raise a ticket with IT every time the membership of their on-call schedule changes, then that on-call schedule might spend a lot of time out of step with reality.
This is where Opsgenie’s intelligent alert routing comes in: in Opsgenie, teams can subscribe to alerts that are relevant to them. This means that you could create one integration that would route alerts dynamically to teams.
However, the systems that send alerts to Opsgenie have never had visibility into where the alerts were going. This led to tension between the central services teams administering tools like Opsgenie, and the individual DevOps teams trying to get alerts where they need to go.
Typically, integrations with Opsgenie rely on webhooks that accept alerts before routing them to Opsgenie Teams. You can add an integration to a specific Team, or add an integration to your entire Opsgenie account that individual Teams subscribe to.
This approach works for companies with only a few teams or those using a relatively basic monitoring solution, but can quickly become impractical once users need to choose one integration from a list of hundreds or more for each alert. Furthermore, adding or removing Opsgenie Teams would require changes in the upstream monitoring product in order to send alerts to new teams.
In contrast, the Splunk Infrastructure Monitoring integration remains aware of Opsgenie Teams at all times, and presents these options when users are deciding which Teams to route an alert to. Customers with hundreds of teams can add just one Opsgenie integration that can send Splunk Infrastructure Monitoring alerts to any of them, or add and remove Teams without changing the integration on the Splunk Infrastructure Monitoring side.
Log in to Splunk Infrastructure Monitoring and navigate to the Integrations page in the app, then search for or scroll to the “Opsgenie” tile. Click on it, select “New Integration” and assign it a name (e.g. “Opsgenie integration”), and add your Opsgenie Integration’s API key.
Click save, and your integration is now ready to use.
Open an alert rule or team notification in Splunk Infrastructure Monitoring, then click “Add Recipient” and choose Opsgenie from the list of available integrations to specify the team that should receive alerts:
Opsgenie users within that Team will now receive alerts from Splunk Infrastructure Monitoring:
Scalable, maintainable alerting and incident management solutions are critical to sustaining the pace of software innovation in modern DevOps organizations. We’ve provided Opsgenie users with a single integration for Splunk Infrastructure Monitoring that easily supports hundreds of teams, ensuring that actionable, real-time alerts are always routed to the right person.
If you’re not already using Splunk Infrastructure Monitoring, get started with a free trial.
This post features contributions from Rebecca Tortell and Aaron Sun.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.