If you have or plan to collect data from Google Cloud Platform (GCP), you will have noticed that your option of ingesting data has been by using Splunk’s Google Cloud Platform Add-On.
However, many customers are adopting “serverless” cloud services to deliver their cloud solutions. There are many reasons for this, but mainly it provides solutions that do not require any overheads of server or container management, that scale and is delivered as a part of their cloud platform. Although the Add-on can “pull” data from GCP, a new library of functions provides serverless “push” capability that replicates and extends the Add-on functionality to send data into Splunk, via HEC. This “push” capability allows GCP customers to have a similar capability to AWS customers using Kinesis Firehose or Lambda functions.
As these functions run as a service inside of GCP, they automatically scale to the demands of the workload and are costed by usage only (i.e. not always on). With the core service also being fault-tolerant, it also eliminates the need to size, or configure a failover or high availability setup for your VMs running the Add-On. Another advantage is that no Service Accounts or Keys need to be shared outside of your GCP environment.
SO WHAT'S IN THE BOX?
The PubSub Function is configured to collect any message which is placed in a PubSub topic that it subscribes to. In general, these messages are logs that are exported from Stackdriver, providing service monitoring, audit, and activity logs from GCP. The diagram below shows how the function works.
The logs are collected from Stackdriver via Log Export, and then the log subscription is put into a PubSub Topic. More than one Log Export can be aggregated into one PubSub Topic if needed. This configuration is the same as the Add-On. Once the function is installed in your GCP Project, it can subscribe to the PubSub Topic. On receiving a PubSub message, the function will send the contents onwards to Splunk via HEC. Where there is a failure in sending the event to HEC, the message is sent to a Retry PubSub Topic. Based on a schedule, the events in the Retry PubSub Topic are periodically flushed out into Splunk HEC (or subsequently sent if there is another failure).
The Metrics Function is configured to request metrics from Stackdriver for your infrastructure components. A list of metrics can be configured per function, along with the frequency/interval of the metrics. As with the PubSub Function, any failed messages are sent to the Retry PubSub Topic to be periodically resent to Splunk at a later time.
The Metrics Function can be configured to send the metrics into Splunk in 2 formats:
The GCS Function sends data in from a GCS Bucket into Splunk via HEC. The function is triggered by a new object being inserted (Finalised/Created) into a bucket. The function is configured to read events from the object item, splitting the events by the same settings as the sourcetype definition in Splunk (Line Breaker). If the objects are large, the contents of the object are split up into batches of events before being sent to Splunk. This way, there is no limit to the size of the object (apart from GCS limitations) that is sent to Splunk, and the events in the objects are distributed across the indexers (via the Load Balancer).
This functionality is currently not available with the GCP Add-On.
As an additional capability of the GCS Function, another function can be used to request GCP Asset information (inventory). This function can be scheduled to run and call the gcloud Asset API, which sends the asset information into a GCS Bucket. If a GCS Function is set up to be triggered by new objects in that Bucket, it will be able to send the asset data into Splunk.
The function library is provided with a Retry Function that is used in conjunction with a PubSub Topic and Cloud Schedule. The PubSub Topic collects any events or metrics (or batches of events) that were attempted but failed to be sent to Splunk HEC. This is usually caused by connectivity issues between GCP and the Load Balancer/Indexers. If one of the functions does encounter a failure, they send the event into a Retry PubSub Topic. This topic can be shared as a Retry topic for across multiple functions (many to one vs one to one). A Cloud Schedule will trigger the Retry Function after a nominated time (defined by a cron schedule). The Function simply picks up the events/metrics from the Topic and sends on to the same original HEC URL and Topic. If the send fails during the retry process, it will remain on the Topic to be re-tried at the next scheduled call.
The contents of the Library are available for downloading here: https://github.com/splunk/splunk-gcp-functions
The repository contains documentation and a set of examples on how to set up the functions. The examples contain scripts that will clone the repository and install/set up the functions (with gcloud cli commands) in a basic example configuration. The example scripts can be reused in your Cloud automation/orchestration builds to automate deployment of the functions into your GCP Projects.
Thanks for reading!
Paul
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.