Status pages have become the end-users window into your team’s operations. Companies with status pages are doing the right thing for their users — building in some transparency while mitigating frustration and support contact.
For the benefits of status pages to pay off, organizations need to treat them as something more than active wiki-pages run by support. Let’s take a look at status pages, including:
The status page basics are simple: You have a publicly accessible page that lists the state of your application services and regions, usually with the colors green, yellow and red. The primary purpose of the page is to let users know when there are issues, or if there’s something wrong on their end.
If your business offers more than one service, each individual service would likely have its own status page, showcasing the various regions it covers and its current availability status.
Status pages are simple in principle – but there is one main problem. More pages than not still favor technical users, and they’re often hard to find. This means, if the organization doesn’t encourage status page adoption, which many people see as a risk and not a benefit, then they’re not likely to be utilized or demonstrate value. For many companies, status pages are just for vanity.
Many added benefits of a status page are as follows:
If you have a status page, users can visit this page whenever they suspect an issue or detect an incident. If they validate there’s an issue with one of the application services on the page then they’re less likely to submit a support ticket for impacted functionality. This helps both your employees and your customers:
The first sign of trouble for application issues usually surfaces on social media. Your social team can leverage a status page as part of responses to people who report issues.
Historical status page data can be used to validate a company’s service level agreements (SLAs) and build trust with your product’s customers.
While status pages aren’t a complete troubleshooting tool when your support team gets a whiff of something going wrong, a status page could potentially act as an initial indicator. This can help support teams anticipate what they expect to receive from customers. A little bit of forewarning can help technical support engineers plan their outage communication and response protocol.
Status pages are basically a requirement now. Without out, especially when you’re serving a technical audience that expects it, you’re not following the status quo of the industry. Technical users know things break — so they appreciate transparency from companies about:
The benefits are easy enough to understand. But, putting them into practice can be surprisingly hard. If you’re the type of organization that uses status pages as vanity so that you can say you have one, then nothing more can happen until that mentality goes away.
Organizations should be more afraid of a lack of transparency than too much visibility. In fact, organizations will likely see bottom-line benefit when implementing successful status pages. So, what makes a successful status page? Let’s look at the best practices.
In the early days of status pages, the pages were updated manually. This made for low utility because, when something breaks, the last thing you can expect a support team to do is manually update a page.
Today, status updates must be automated. That means the developers and support teams need to agree on a few items:
A status page needs to have more than the status of the service. When there’s degradation or a full-fledged incident, there should be accompanying text to explain:
Vendors are often guilty of creating documentation and status pages that explain services based on how they interact with them. Which, for most organizations, is based on how the team is organized.
To the user, this categorization can be meaningless, at best, and confusing at worst. The services should be broken down from the user’s perspective, usually based on the individual components where they consume functionality.
It isn’t that historical status page data is a requirement — it’s not. But, having historical context, not just current status, can help calm the nerves of people when something is wrong. Over the long-term, assuming end-users see more green than yellow or red, it gives the sense that incidents are not the norm. Otherwise, people could take a single service outage as a sign the entire application is flawed.
In addition to historical data, post-incident reports and updates are great for transparency. Additionally, more detailed information for critical (P1) outages, like in blog posts, articles or community portals, are hugely beneficial for visibility and building customer trust.
Your status page is part of your incident response activity. When there are service issues or degradation, automation will alert those who are on-call and update the status page. When incidents are resolved, the status page will automatically update itself. And then, during post-incident reviews, status pages are an artifact for historical context.
A good status page strategy will create positive impacts—but of course it costs time and money. The important thing to remember is that cost is negligible when compared to doing nothing. Let’s look at some common pushback to status pages and see why they don’t hold up.
When service status is green, no one will pay attention. When a service’s status is yellow or red, everyone will – including your competition. And, while they might try to use that asset against you, the response is easy, just highlight your commitment to the customer.
Some will use the status page as a tool to complain. Let’s be clear: if they don’t use your status page as a complaint tool, they will use something else far worse (Reddit, Twitter, etc.) The complaints that happen on social media, without a status page, are often more subjective, and that subjectivity may build long-term negative sentiment against your brand.
Status page transparency can possibly expose internal challenges with the dev and ops teams. For example:
Customer visibility has nothing to do with the severity of these long-term faults in your development team. Perhaps, being public will push the issue to address something that probably already should have been.
Status pages can be a checklist item for most organizations. Those who leverage them to focus on the power of incident management and automation see huge gains in terms of customer satisfaction, transparency, and technical support cost-reduction.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.