Learn

April 26, 2024

7 Minute Read

Availability Zones: The Complete Guide for 2025

By Shanika Wickramasinghe

During the early periods of cloud computing, most organizations used single-location data centers. These single-location data centers often faced higher risks of downtime and service disruption due to localized disasters or hardware failures.

As a solution to these problems, cloud services like AWS introduced the concept of availability zones. This introduction was an important milestone in the evolution of cloud computing, as it facilitated high availability through geographic distribution.

Due to the success of this approach, many organizations have adopted this practice today.

What is an Availability Zone?

Before understanding availability zones, it's essential to be familiar with the concept of cloud regions, as these are related terms.

Cloud regions refer to specific geographic areas around the world that host multiple data centers equipped with the necessary infrastructure to run cloud applications.
Within these cloud regions, there are smaller subdivisions known as Availability Zones (AZs).

These Availability Zones within a cloud region are typically separated by significant distances, often many miles. As a result, if one Availability Zone is affected by a natural disaster or power outage, the other AZs can operate without interruption.

Plus, each AZ has its own independent power supply, cooling system, and network infrastructure that helps run applications uninterrupted.

(Cloud service models explained: SaaS vs. PaaS vs. IaaS.)

How availability zones work

Let’s see what happens when a user decides to use an availability zone step by step.

Step 1. First the user logs in to their cloud service provider's management console. This is the web interface or command-line tool where all configurations and deployments are managed.

Step 2. The user selects a specific geographic region based on their requirements for latency, legal, or data sovereignty reasons. Regions are broad areas that typically contain multiple Availability Zones.

Step 3. Within the chosen region, the cloud provider lists available Availability Zones. These AZs are pre-configured and already operational, established by the provider to ensure high availability and fault tolerance.

Step 4. The user decides to deploy their resources, such as virtual machines, databases, or storage. Here, they can select specific Availability Zones. Typically, the interface allows specifying one or more AZs.

If deploying virtual machines, the user would configure aspects like the operating system, network settings, and security configurations.
For databases, the user might choose replication options across AZs to ensure data durability.

Step 5. Once the configuration is set, the user executes the deployment command. The cloud service provider’s backend orchestrates the creation of these resources in the specified Availability Zones.

Step 6. The cloud provider’s system automatically places the resources in different physical locations corresponding to the chosen AZs. If the user has selected multiple AZs for their resources, the system handles the replication of data across these zones.

Step 7. For applications needing high availability, the user might set up load balancing that directs traffic to instances across multiple AZs. This setup helps in distributing the load evenly and provides a failover mechanism in case one AZ becomes unavailable.

Step 8. The cloud provider continuously monitors the health of all AZs. In case of an AZ failure, the provider’s infrastructure is designed to reroute traffic and scale resources up in other operational AZs automatically.

(Related reading: the shared responsibility model.)

How AZs affect performance & optimization

On top of maximizing availability through the strategic redundancy tactics of availability zones, they also affect performance and optimization in a number of ways.

Improved performance

The latency within an AZ is generally low, allowing for faster communication. However, when it comes to cross-AZ communication, the latency might be higher. So when you limit cross-AZ traffic to only the most important tasks, the performance of your system will greatly improve.

Cost optimization

Transferring data between two AZs can be expensive as they would add to data transfer costs. Therefore, an architectural pattern called “Availability Zone Affinity” can be used to improve the performance of your cloud system while reducing costs.

This strategy helps in limiting cross-AZ data transfers and balancing workloads across AZs to prevent over-provisioning of resources.

(Related reading: cloud cost management & CapEx vs. OpEx.)

Resource distribution

AZs allow the user to provision computing resources and logically manage them. These resources include:

Compute instances
Storage services
Networking components

You’ll be able to deploy and manage these resources using management consoles or APIs offered by the service provider.

Fault tolerance

In the event that an AZ shuts down due to some kind of issue, services can be set to automatically failover to other AZs. This makes sure your business operations will not face any downtime due to outages.

(Read our full fault tolerance explainer.)

Data sovereignty & compliance in AZs

Since AZs are geographically separated zones within a region, and these regions could be located in different countries, complying with data sovereignty is crucial for businesses utilizing AZs.

Data sovereignty is the concept of data being subject to the laws of the country in which it is located. To comply with these laws and regulations, organizations need to make sure that their data stays within the boundaries of certain countries and regions.

Different laws might apply to the data that is located in different countries. For example, The European Union has the General Data Protection Regulation, which mainly focuses on how sensitive data of European citizens should be given the appropriate confidentiality. If a business were to defy these laws they’ll have to face heavy fines. Therefore, maintaining compliance with data protection laws is important for a business.

Complying with data sovereignty allows businesses to maintain control over their data. This can help make sure that they’re protected from unauthorized access and security breaches of any kind. This is especially important for organizations handling valuable data like:

Personal info
Financial data
Health records

By following data sovereignty, you ensure that your data is easily accessible in the off chance of a disaster — instead of having to face legal issues or other challenges when you try to access them under time-sensitive circumstances.

Major cloud providers: Availability Zone comparison

While there are many cloud providers that offer Availability Zones, AWS, Google Cloud, and Azure are the three largest. Let's focus on them and compare the size and network capabilities of their Availability Zones.

Google

Google has cloud resources spread out worldwide, with 106 AZs in 35 Cloud Regions to provide speed and reliability to businesses globally. By utilizing Google Cloud, you’ll be able to select geographical zones that best meet your needs based on availability and proximity, supporting compliance requirements.

Zones in Google Cloud regions are located relatively close to each other, allowing for high-bandwidth communication.

AWS

As of 2023, Amazon Web Services has a total of 102 AZs across 32 Cloud Regions, with plans to implement four additional regions and 12 more AZs. These AZs operate with independent power, cooling, and physical security, ensuring low-latency network connections.

AWS AZs also allow you to scale your applications up or down by adding resources within zones.

Azure

Microsoft Azure offers high availability to protect your applications and data from disasters and disruptions, with over 60 regions, each having at least three Availability Zones.

Besides the standard features like independent power, cooling, and network infrastructure, Azure AZs meet compliance and regulatory requirements for critical applications. They also offer built-in security for data transfers within Zones and across regions.

Choosing an AZ service

Although the choice of selecting the right cloud service with the most suitable AZ might not be make or break, it will certainly define how well your business performs. To ensure that your business makes the right choice, there are a few factors you should consider before choosing the right AZ.

Each provider offers a range of services. For instance, Google Cloud might offer certain exclusive services that might be very useful for your business.
Pricing models differ among these three options. Consider the cost implications for your organization before making a financial commitment to a certain service,
Always anticipate growth. Make sure to select the AZ service with the Scalability options that work best for you.
Make sure the service provider you choose will fulfill regulatory compliances necessary for your field.
Choose a service with zones you can easily access. This can reduce latency and help you avoid any complications in case you want to access your data in a hurry.

Complexities with AZs

Although AZs and cloud regions offer businesses the opportunity to maximize their operations, they do come with their own complexities that can be somewhat challenging to face.

In the event that an entire region fails, access to the whole application could be lost. This means the availability sets cannot span regions.
Although the system is capable of evading the issue of zonal outages, it is still relatively vulnerable to hardware failure, software bugs, internet routing issues, and natural disasters.
If and when an outage occurs in a single availability zone, it can be very problematic for systems like Kubernetes. This is because they need the control plane deployed across multiple zones to function properly and ensure the availability.
Depending on the use case, there can be latency issues when communicating across AZs even though they are designed to be close enough for low latency.
As Service Level Agreements (SLAs) may differ depending on the chosen AZ, make sure to check the SLA as this affects uptime guarantees.

Best practices for Availability Zones

Configure at least two availability zones, with each fleet having one subnet per unique AZ.
Make sure to scale horizontally, creating data redundancy across AZs to make sure your system stays resilient against AZ failures.
To overcome regional failures, replicate your data across cloud regions.
Always choose a cloud region closest to your location to have as low latency as possible.

Get in the zone

Availability Zones represent a critical advancement in cloud computing, offering not only enhanced resilience but also optimizing operational efficiency. Their strategic implementation across multiple geographies helps in mitigating the impact of localized failures, ensuring continuous service delivery.

For businesses, this means a reliable infrastructure that supports both growth and compliance with various regulatory requirements.

As technology evolves, the importance of wisely selecting and managing Availability Zones will only increase, making it imperative for organizations to stay informed and proactive in their cloud strategy.

See an error or have a suggestion? Please let us know by emailing splunkblogs@cisco.com.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Shanika Wickramasinghe

Shanika Wickramasinghe is a software engineer by profession and a graduate in Information Technology. Her specialties are Web and Mobile Development. Shanika considers writing the best medium to learn and share her knowledge. She is passionate about everything she does, loves to travel and enjoys nature whenever she takes a break from her busy work schedule. She also writes for her Medium blog sometimes. You can connect with her on LinkedIn.

Learn 2 Min Read

Splunk Podcasts

Wondering if Splunk has any podcasts? The answer is YES! In this article, you can see all the podcasts that Splunk has published over the years.

Learn 5 Min Read

Remote Code Execution (RCE) Explained in Detail

Remote code execution (RCE) attacks are a significant threat to organizations. Let's discuss more about RCE in this post.

Learn 9 Min Read

Disaster Recovery Planning: Getting Started

In this post, we'll discuss a framework and steps for creating a disaster recovery plan, so you can stay resilient over the long-term.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram