In today's data-driven landscape, organizations are confronted with an overwhelming volume of data, which is often accompanied by budgetary constraints. To address these challenges, a thoughtful data tiering strategy is crucial. This can be done by developing the practice of:
After which, powerful data management and federated search capabilities become imperative: with these, you’ll have the flexibility to access data sets across different platforms — and correlate them when you need based on the use case at hand.
Our goal at Splunk is to make data management and accessibility easy and flexible for our customers — so you can gain value out of your voluminous data more efficiently. To that end, we’ve made a couple big announcements this year:
Splunk Edge Processor is a service offering deployed at the edge with a data control plane accessible from Splunk Cloud Platform. It is designed to help customers achieve greater efficiencies in data transformation close to the data source, data placement and improved visibility into data in motion. With Edge Processor, customers can filter, transform, and route data from the edge into Splunk indexes or Amazon S3 buckets.
Federated Search for Amazon S3, on the other hand, is a new capability that allows customers to search data from their Amazon S3 buckets directly from Splunk Cloud Platform without the need to ingest it into Splunk.
In this blog, we will dive into how Splunk Edge Processor and Federated Search for Amazon S3 can help build and implement data strategies to efficiently maximize the value derived from your data.
When addressing data transformation, Splunk Edge Processor is designed to extract only the critical data, employing data reduction techniques to streamline data ingestion into Splunk indexes.
Capturing and cleaning data at close proximity to the source is crucial especially when it comes to sensitive data sets that cannot leave the organization's network boundaries. This way organizations can ensure that only the essential and clean data gets ingested into Splunk. Any extraneous data? You can store that in an external data storage like Amazon S3.
Now, let's look at how you can implement these policies on edge processors.
In addition to the two major announcements, Splunk also announced an updated version of Splunk’s search language SPL2. SPL2 caters to users with diverse query language backgrounds, seamlessly blending SPL and SQL syntax for familiarity. Unleashing an array of robust features, including built-in functions, ability to create custom functions and custom data types, comment integration and many more. SPL2 sets a new standard for concise and powerful data queries.
Now imagine this: anything that can be implemented in SPL2 can be implemented in Edge Processor! That means that any task you implement using SPL2 can be part of your Edge Processor pipelines, including:
All this to say: you can now build data pipelines specific to your organization’s needs.
Today, Splunk Edge Processor can receive data from many different sources like Universal Forwarders, HTTP Event Collector, syslog and many more; and route data to destinations including Splunk Cloud Platform, Splunk Enterprise and Amazon S3. Check out the full list of supported sources and destinations.
In recent years, AWS S3 has become the most popular storage platform for various different use cases because of its ease of use and storage capabilities. It is used for storing data for various different use cases. It could be your web applications writing data to S3, storing analytical data, storing data for compliance/long term retention and many more.
Now with Splunk Federated Search for Amazon S3 you can make these data sets available to Splunk — which means you can use Splunk’s powerful search language to explore them and correlate these data sets with data in Splunk. Yes, this includes data that an Edge Processor sends to Amazon S3.
And an added benefit that Edge Processor provides: data written by Edge Processor is partitioned by time and stored in JSON format in Amazon S3. This enables Splunk Federated Search for S3 to work with the dataset efficiently.
Federated Search for Amazon S3 works by seamlessly integrating Splunk with AWS Glue Data Catalog which provides the necessary schema and metadata for Splunk Cloud Platform to interpret compatible datasets from Amazon S3. This collaboration allows Splunk to effectively search various data formats such as JSON, CSV, Parquet, ORC, compressed files like bzip, gzip, and many more.
This integration enhances the search capabilities for Splunk users, providing a comprehensive and streamlined data exploration experience.
Now that we have learned how Edge Processor and Federated S3 works together to simplify data management and reach, let's see this in action. Here’s a video of how Buttercup Enterprises, a fictional gaming company, is looking into using Splunk’s Edge Processor and Federated-S3 to solve their data engineering problems.
While the possibilities of what customers could leverage Federated Search and the Edge Processor for are unlimited, this blog is an attempt to give a primer on how to leverage these two features and open up ideas on how they can be leveraged for a specific challenge in your organization.
If you are a current Splunk Cloud Platform customer hosted in the US, EMEA (Dublin, Frankfurt, Germany), UK (London), or APAC (Tokyo, Japan and Singapore) Splunk Cloud regions, you can get access to Edge Processor today. Contact your Splunk sales representative, or send an email to EdgeProcessor@splunk.com with your company name, Splunk cloud stack name, and Splunk Cloud region. If you are a Splunk Cloud Platform customer hosted in other Splunk Cloud regions, also contact your Splunk sales representative or send an email to get on the list to be enabled once Edge Processor is available in your region.
For more about Edge Processor, including release plans to support additional sources, destinations, and new functionality, see release notes and documentation.
This blog was co-authored by Joseph Kandatilparambil, Principal Technical Marketing Engineer and Raja Tamilarasan, Senior Sales Engineer at Splunk.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.