Learn

December 03, 2024

9 Minute Read

What Are Distributed Systems?

By Chrissy Kidd

Distributed systems might be complicated…luckily, the concept is easy to understand!

A distributed system is simply any environment where multiple computers or devices are working on a variety of tasks and components, all spread across a network. Components within distributed systems split up the work, coordinating efforts to complete a given job more efficiently than if only a single device ran it.

It makes sense that we’re seeing more and more distributed systems: the internet enables all of us to work remotely, and many computer jobs today are too complex for a single computer to handle them solo. This is the huge advantage — working efficiently, across geographies and teams. We wouldn’t be able to do most of this without distributed systems.

In this article, we’ll explore the operation of such systems, the challenges and risks of these platforms, and the myriad benefits of distributed computing.

What are distributed systems?

Distributed systems generally consist of multiple interconnected devices or computers that work together to perform a task that is beyond the capacity of a single system. These systems work by collaborating, sharing resources and coordinating processes to handle complex workloads. (Distributed systems are the entire basis of the internet, after all.) Distributed systems are essential in situations when the workload is subject to change, such as:

E-commerce traffic on Cyber Monday
A sudden wave of web traffic in response to news about your organization

Historically, distributed computing was expensive, complex to configure, and difficult to manage. Thanks to SaaS, Paas, and Iaas solutions, however, distributed computing has become more streamlined and affordable for businesses of all stripes and sizes.

Today, all types of computing jobs — from database management to video games — use distributed computing. In fact, many types of software, such as cryptocurrency systems, scientific simulations, blockchain technologies and AI platforms, wouldn’t be possible at all without these platforms.

Features of distributed systems

Because they draw on the capabilities of other computing devices and processes, distributed systems can offer features that would be difficult or impossible to develop on a single system.

This includes things like performing an off-site server and application backup — if the master catalog doesn’t see the segment bits it needs for a restore, it can ask the other off-site node or nodes to send the segments. Virtually everything you do now with a computing device takes advantage of the power of distributed systems, whether that’s sending an email, playing a game or reading this article on the web.

Examples of distributed systems

Here are some very common examples of distributed systems:

Telecommunications networks that support mobile and internet networks
Graphical and video-rendering systems
Scientific computing, such as protein folding and genetic research
Airline and hotel reservation systems
Multiuser video conferencing systems
Cryptocurrency processing systems (e.g. Bitcoin)
Peer-to-peer file-sharing systems
Distributed community computing systems
Multiplayer video games
Global, distributed retailers and supply chain management

How distributed systems work

A distributed system begins with a task. Let’s pretend you need to render a video to create a finished product.

The application (really, the distributed applications) managing this task — like a video editor on a client computer — splits the job into pieces. In this simple example, the algorithm gives one frame of the video to each of a dozen different computers (or nodes) to complete the rendering. Once the frame is complete, the managing application gives the node a new frame to work on. This process continues until the video is finished and all the pieces are put back together.

A system like this doesn’t have to stop at just 12 nodes: the job may be distributed among hundreds or thousands of nodes, turning a task that might have taken days for a single computer to complete into one that is finished in a matter of minutes.

When thinking about the challenges of a distributed computing platform, the trick is to break it down into a series of interconnected patterns. Simplifying the system into smaller, more manageable and more easily understood components helps abstract a complicated architecture. Patterns are commonly used to describe distributed systems, such as:

Command and query responsibility segregation (CQRS)
Two-phase commit (2PC)

Different combinations of patterns are used to design distributed systems, and each approach has unique benefits and drawbacks.

Types of distributed systems

There are many models and architectures of distributed systems in use today.

Client-server systems, the most traditional and simple type of distributed system, involve a multitude of networked computers that interact with a central server for data storage, data processing, or other common goal.
Peer-to-peer networks distribute workloads among hundreds or thousands of computers all running the same software.
Cell phone networks are an advanced distributed system, sharing workloads among handsets, switching systems and internet-based devices.

At this point, you might realize this: The most common forms of distributed systems today operate over the internet, handing off workloads to dozens of cloud-based virtual server instances that are created as needed, then terminated when the task is complete.

Key characteristics of a distributed system

So now that we “get” what distributed systems are, we can start to assign key features to them. Here’s what good distributed systems have in common:

Scalability. The ability to grow as the size of the workload increases is an essential feature of distributed systems, accomplished by adding additional processing units or nodes to the network as needed.
Concurrency. Distributed system components run simultaneously. They’re also characterized by the lack of a “global clock,” when tasks occur out of sequence and at different rates.
Availability and fault tolerance. If one node fails, the remaining nodes can continue to operate without disrupting the overall computation.
Heterogeneity. In most distributed systems, the nodes and components are often asynchronous, with different hardware, middleware, software and operating systems. This allows the distributed systems to be extended with the addition of new components.
Replication. Distributed systems enable shared information and messaging, ensuring consistency between redundant resources, such as software or hardware components, thus improving fault tolerance, reliability, and accessibility.
Transparency. The end user sees a distributed system as a single computational unit (a single app) rather than as its underlying parts, allowing users to interact with a single logical device rather than being concerned with the system’s architecture.

Benefits, challenges & risks of distributed systems

Before moving on further, let's discuss the advantages, risks and challenges of distributed systems.

Benefits of distributed systems

Distributed systems offer a number of advantages over monolithic, or single, systems:

Scalability & flexibility. It is easier to add computing power as the need for services grows. In most cases today, you can spin up servers to a distributed system on the fly, increasing performance and further reducing time to completion.
Fault tolerance. Distributed systems reduce the risks involved with having a single point of failure, bolstering reliability and fault tolerance.
Reliability. A well-designed distributed system can withstand failures in one or more of its nodes without severely impacting performance. In a monolithic system, the entire application goes down if the server goes down.
Speed. Heavy traffic can bog down single servers when traffic gets heavy, impacting performance for everyone. The scalability of distributed databases and other distributed systems makes them easier to maintain and also sustain high-performance levels.
Geo-distribution. Distributed content delivery is both intuitive for any internet user, and vital for global organizations.

(Know the differences between CDNs & load balancers.)

Challenges of distributed systems

Distributed systems are considerably more complex than monolithic computing environments, and raise a number of challenges around design, operations and maintenance. These include:

More opportunities for failure. The more systems added to a computing environment, the more opportunity there is for failure. If a system is not carefully designed and a single node crashes, the entire system can go down. While distributed systems are designed to be fault tolerant, that fault tolerance is neither automatic nor foolproof.
Synchronization process challenges. Distributed systems work without a global clock, requiring careful programming to ensure that processes are properly synchronized to avoid transmission delays that result in errors and data corruption. In a complex system — such as a multiplayer video game — synchronization can be challenging, especially on a public network that carries data traffic.
Imperfect scalability. Doubling the number of nodes in a distributed system doesn’t necessarily double performance. Architecting an effective distributed system that maximizes scalability is a complex undertaking that needs to take into account load balancing, bandwidth management, and other issues.
More complex security. Managing a large number of nodes in a heterogeneous or globally distributed environment creates numerous security challenges. A single weak link in a file system or larger distributed system network can expose the entire system to attack.
Increased complexity: Distributed systems are more complex to design, manage and understand than traditional computing environments.

Risks of distributed systems

The challenges of distributed systems create a number of correlating risks.

Security. Distributed systems are as vulnerable to attack as any other system, but their distributed nature creates a much larger attack surface that exposes organizations to threats.
Risk of network failure. Distributed systems are beholden to public networks to transmit and receive data. If one segment of the internet becomes unavailable or overloaded, distributed system performance may decline.
Governance and control issues. Distributed systems lack the governability of monolithic, single-server-based systems, creating auditing and adherence issues around data privacy laws. Globally distributed environments are challenging when it comes to providing certain levels of assurance and understanding exactly where data resides.
Cost control. Unlike centralized systems, the scalability of distributed systems allows administrators to easily add additional capacity as needed, which can also increase costs. Pricing for cloud-based distributed computing systems are based on usage (such as the number of memory resources and CPU power consumed over time). If demand suddenly spikes, you might face a massive bill.

(Related reading: cloud cost trends & the cost of downtime.)

Real-world guidance: How to set up a distributed system

Distributed deployments can range from tiny, single department deployments on local area networks to large-scale, global deployments. In addition to their size and overall complexity, organizations can consider deployments based on:

The size and capacity of their computer network
The amount of data they’ll consume
How frequently they run processes and whether they'll be scheduled or ad hoc
The number of users accessing the system
Capacity of their data center
The necessary data fidelity and availability requirements

The management of distributed systems is simplified by deploying:

Container orchestraters (Kubernetes is the prime example) that offer automated scaling, deployment, and operation of containers across the cluster of hosts.
Databases that provide a consistent data layer, at the same time ensuring that all the nodes in the system can access the same data while supporting data replication for fault tolerance.

Distributed deployments are categorized as departmental, small enterprise, medium enterprise, or large enterprise. By no means formal, these categories are a starting point for planning the needed resources to implement a distributed computing system. Importantly, expect distributed systems to evolve over time, transitioning from departmental to small enterprise as the enterprise grows and expands.

Tracking what goes on in distributed systems

We know clearly that, for all their benefits, distributed systems are complicated. Knowing what goes on within — the observability of that system — is a distinct advantage. Luckily, it’s one you can achieve with distributed tracing.

Without distributed tracing, a globally distributed system environment would be impossible to monitor effectively.

Distributed tracing, sometimes called distributed request tracing, is a method for monitoring applications — typically those built on a microservices architecture — which are commonly deployed on distributed systems. Distributed tracing is essentially a form of distributed computing in that it’s commonly used to monitor the operations of applications running on distributed systems.

In software development and operations, tracing is used to follow the course of a transaction as it travels through an application. An online credit card transaction as it winds its way from a customer’s initial purchase to the verification and approval process to the completion of the transaction, for example. A tracing system monitors this process step by step, helping a developer to uncover bugs, bottlenecks, latency issues, or other problems with the application.

Distributed tracing is necessary because of the considerable complexity of modern software architectures. A distributed tracing system is designed to operate on a distributed services infrastructure, where it can track multiple applications and processes simultaneously across numerous concurrent nodes and computing environments.

Applying access control in distributed systems

Administrators use a variety of approaches to manage access control in distributed computing environments. The approaches range from traditional access control lists (ACLs) to role-based access control (RBAC).

One of the most promising access control mechanisms for distributed systems is attribute-based access control (ABAC), which controls access to objects and processes using rules that include information about the user, the action requested, and the environment of that request. Administrators can also refine these types of roles to restrict access to certain times of day or certain locations.

Distributed systems vs. microservices: what's the difference?

Although there are some similarities between microservices and distributed systems, they are not the same.

Microservices are an approach to design where an application is broken into multiple smaller services that can be deployed independently.
Distributed systems consist of multiple computers that work together to perform a single task.

The main difference is that microservices focus on flexibility and making the system modular. On the other hand, distributed systems focus on resource sharing and making the system scalable.

On the contrary, an SOA (Service oriented architecture) is a broader design approach where multiple services communicate over a network. We can consider microservices to be a well-refined version of SOA, with more focus on independent deployment and lightweight communication.

Distributed systems aren’t going away

Distributed systems are well-positioned to dominate computing as we know it for the foreseeable future, and almost any type of application or service will incorporate some form of distributed computing. The need for always-on, available-anywhere computing isn’t disappearing anytime soon.

See an error or have a suggestion? Please let us know by emailing splunkblogs@cisco.com.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Chrissy Kidd

Chrissy Kidd is a technology writer, editor, and speaker. The managing editor for Splunk Learn, Chrissy has covered a variety of tech topics, including cybersecurity, software development, and sustainable technology. She's particularly interested in how tech intersects with our daily lives.

Learn 7 Min Read

API Security Threats & How To Protect Against Them

Explore the evolving landscape of API threats and their impact on compliance and risk management. Learn critical best practices to secure your APIs.

Learn 5 Min Read

What’s EDR? Endpoint Detection & Response

Learn how Endpoint Detection and Response secures endpoints and detects hidden threats to help organizations achieve better cybersecurity.

Learn 9 Min Read

KPI Management: A Complete Introduction

This blog post covers the ins and outs of KPI Management to help you understand good vs bad KPIs and common pitfalls to avoid.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram