Federated learning in artificial intelligence refers to the practice of training AI models in multiple independent and decentralized training regimes.
Traditionally, AI models required a single, centralized dataset and a unified system of training. While this method is a straightforward approach to training an AI tool, putting all of that data in one location can present undue risk and vulnerabilities, especially when that data needs to move around.
That’s where federated learning comes in — as a response to the privacy and security concerns brought on by traditional machine learning methods.
In this article, we’ll explore AI training, how it works and what benefits organizations can expect from adopting federated learning models. Let’s dig in!
Before we dive into the details, let’s start with a quick overview of AI training, or machine learning.
AI training is the process of teaching a system to acquire, interpret and learn from an incoming source of data. To accomplish those three goals, AI models go through a few steps:
As a first step, an AI model is fed a large amount of prepared data and is asked to make decisions based on that information. This data is generally filled with all kinds of tags or targets that tell the system how it should categorize the information — like thousands of sets of training wheels all guiding the system toward to desired outcome. Engineers make adjustments here until they see acceptable results, and at least some degree of accuracy using tagged data.
Once the AI model has trained on its initial data set and adjustments have been made, the system is fed a new set of prepared data, and performance is evaluated. This is where we keep an eye out for any unexpected variables, and verify that the system is behaving as expected.
After passing validation, the system is ready to meet its first set of real-world data. This data is unprepared and contains no tags or targets that the initial training and validation sets included. If the system can parse this unstructured data and return accurate results, it's ready to go to work. If not, it heads back to the training stage for another iteration of the process.
In the traditional machine learning paradigm, all of that training data sits in one location — meaning data may need to be exchanged between a central cloud location and the model.
This presents some serious issues:
AI models train on large datasets. These data sets are stored in silos and a single, unified and centralized data platform may not be available to store them in their entirety. Different data sets may be highly relevant and valuable to AI model training but are subject to strict data privacy limitations. The data may be proprietary or contain sensitive personally identifiable information on end-users. Therefore, only a limited number of users may be authorized to access relevant data sets.
The use of sensitive data may be subject to stringent compliance regulations and liability for damages in the event of a data breach or security incident. Data anonymization in data transfer, data storage and data pipeline processes may not be possible if the algorithms do not allow such provisions or if the necessary resources are not available.
In a traditional machine learning approach, data distribution can be highly homogeneous, and that can be a problem.
AI models that are trained on a limited curated data set from a few sources may not adequately learn and represent the complete data distribution of these sources. If the training data does not represent a large and diverse volume of its underlying distribution, the learned model may not be able to infer (predict) accurately.
This can also create learning imbalances and biases that are highly discouraged in real-world applications. For instance, if the training data is curated from specific demographic groups, the models may underperform on data from other demographic groups that were not available during training. To develop a scalable learning model, it should be able to train on large volumes of data simultaneously. This may be referred to as global training, which may be impossible due to the limitations stated above.
(Learn more about bias & similar ethical concerns in AI.)
Traditional ML algorithms often require access to large amounts of data for training. This data might contain sensitive information, and the process of collecting, storing, and sharing this data can increase the risk of data exposure, breaches, or leaks.
Furthermore, as the model is still in development, it can be an attack vector itself through what professionals refer to as “model inversion”. By exploiting ML model vulnerabilities, attackers can infer sensitive information about individual data points used in the training process. This involves querying the model with carefully crafted inputs to learn about the training data or individual records, which could compromise the privacy and security of the dataset as a whole
So how does federated learning work, and how does it solve these key problems?
To start, in federated learning, instead of using the so-called “global” training regime described above, the training process is divided into independent and localized sessions.
A base model is prepared using a generic, large dataset — this model is then copied and sent out to local devices for training. Models can be trained on smartphones, IoT devices or local servers that house data relevant to the task the model is aiming to solve. The local data generated by these devices will be used to fine-tune the model.
As the models train, there will be small iterative updates happening within them as they get closer to achieving their desired level of performance — these small updates are called gradients. Rather than sending back the fully parsed dataset from the device, federated learning models only send the gradients of the AI model back to the central server.
As all of these gradients are sent to that central server, the system can average all of the update information and create a reflection of the combined learning of all participating devices.
To reiterate, the basic structure for a federated learning model shows that:
Much like a traditional AI learning model, this process is repeated multiple times until the model reaches a state where it can perform well across diverse and varied datasets. Once the desired level of performance is achieved and confirmed, the global model is ready for deployment.
What we’ve just described is a simple case of federated machine learning. As this process has become more commonplace, organizations have seen several operational benefits including:
Modern federated AI algorithms may use a variety of training regimes, data processing and parameter updating mechanisms depending on the performance goals and challenges facing federated AI.
Some of these challenges include that:
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.