Partners

July 20, 2021

5 Minute Read

Taking Inventory of Your Google Cloud

By Matthew Hite

Splunk is committed to using inclusive and unbiased language. This blog post might contain terminology that we no longer use. For more information on our updated terminology and our stance on biased language, please visit our blog post. We appreciate your understanding as we work towards making our community more inclusive for everyone.

Splunk Cloud Architect Paul Davies recently authored and released the GCP Application Template, a blueprint of visualizations, reports, and searches focused on Google Cloud use cases. Many of the reports included in his application require Google Cloud asset inventory data to be periodically generated and sent into Splunk. But HOW exactly do you craft that inventory generation pipeline so you can "light-up" Paul's application dashboards and reports?

In this blog post, I'll describe and compare three methods operators can use to ingest Google Cloud asset inventory data into Splunk. The first method leverages a "pull" data ingest strategy while the other two methods I cover are "push" based. For each ingest method, I'll provide detailed setup instructions or provide pointers to them. Finally, I'll make a personal recommendation on what method I would choose if it was my own environment.

Note: Export vs. Feed

Google provides both a batch export and a change feed view of asset data. The GCP Application Template leverages data generated by the batch export data API and does not support the TemporalAsset schema used in the asset change feed API.

Pull

Method #1 - Pull From Bucket

The first method involves using Cloud Scheduler to trigger a Cloud Function on a regular schedule. This Cloud Function then sends a batch export request to the Asset Inventory API. This results in a bulk export of the current cloud asset inventory to a Cloud Storage bucket. Finally, the Splunk Add-on for Google Cloud Platform (GCP-TA) is configured to periodically monitor and ingest new files appearing in the Cloud Storage bucket. The following diagram illustrates this process.

Components

Cloud Scheduler
Cloud Pub/Sub
Cloud Function (asset export)
Cloud Storage
GCP-TA storage input

Pros

Simple setup
Pull solution avoids Splunk HEC whitelisting considerations
GCP-TA is developed and supported by Splunk
Minimal Google Cloud infrastructure cost to implement

Cons

Large inventories may exceed file size limitations of GCP-TA GCS input (~250 MB)

Setup

Please refer to the detailed setup instructions in the GitHub repository.

Push

Method #2 - Serverless push-to-Splunk

This method leverages Cloud Functions not only for triggering an export of asset inventory data, but also to perform the delivery to a Splunk HEC. Cloud Scheduler regularly triggers a Cloud Function which in turn is responsible for initiating an Asset Inventory API bulk export to Cloud Storage. A second Cloud Function is configured to trigger on bucket object create/finalize events. This function will split the exported files into smaller files if necessary and deliver them directly to a Splunk HEC. Should the delivery fail, the messages are placed into a Pub/Sub topic for later redelivery attempts. The following diagram illustrates this process.

Components

Pros

Solution should incur minimal Google Cloud infrastructure costs
Automated redelivery attempts of deadletters
GCP Application Template is primarily tested with this method as it is written by the same author

Cons

Not a Splunk or Google supported solution
Whitelisting Cloud Function to Splunk HEC traffic requires additional complexity of Cloud NAT and advanced Serverless VPC Access connector configuration

Setup

Details on how to configure this method can be found in the splunk-gcp-functions GitHub repository.

Method #3 - Dataflow

The final method leverages Dataflow batch and streaming jobs to facilitate delivery of asset inventory data to a Splunk HEC. This approach uses Cloud Scheduler to regularly trigger a Cloud Function which in turn is responsible for initiating an Asset Inventory API bulk export to Cloud Storage. Another Cloud Function receives an event trigger when the export operation is complete. This function then starts a batch Dataflow job which converts newline-delimited JSON files into Pub/Sub messages and publishes them to a topic. In parallel, a streaming Dataflow pipeline is also running which subscribes to the aforementioned topic and delivers them to a Splunk HEC.

The following diagram illustrates this process.

Components

Cloud Scheduler
Cloud Function (asset export)
Asset Inventory API
Cloud Storage
Cloud Function (launch Dataflow batch job)
Batch Dataflow job
Pub/Sub
Streaming Dataflow job
Splunk HEC

Pros

Dataflow templates developed and supported by Google
Natural choice for environments already leveraging streaming Dataflow to Splunk HEC log delivery pipeline
Dead letter topic support for undeliverable messages
Easy to monitor delivery backlog through Google Cloud Pub/Sub metrics
Encryption of HEC token using KMS

Cons

Both Dataflow templates are in beta, pre-GA launch stage
More complex setup required compared to other methods
Additional costs of running a Dataflow cluster
Dataflow worker to Splunk HEC whitelisting requires Cloud NAT to ensure static egress IPs

Setup

Please refer to the detailed setup instructions in the GitHub repository.

Recommendations

In this blog, I've described three methods for ingesting cloud asset inventory data into Splunk. But which is "the best?" Like many things in cloud, there isn't one universally "right" answer.

If you're just getting started and just want to get the reports up and running, I recommend trying the "pull from bucket" method described in option #1. The number of moving pieces is minimal and it is relatively easy to setup. It's also dirt cheap in comparison to other approaches and will likely take you pretty far.

If you're already using Dataflow to stream Cloud Logging events into Splunk, then I recommend pursuing option #3. Customers usually find themselves turning to the Dataflow method for sending Cloud Logging events to Splunk as they scale out their infrastructure and need to ensure what they deploy is easily monitorable, horizontally-scalable, and fault-tolerant. Why not leverage that same infrastructure investment to deliver asset inventory data to Splunk?

If option #3 is not within your reach or you view it as "overkill," the "serverless push-to-Splunk" method described in option #2 may be preferable. It has some of the same fault-tolerant and horizontally scalable properties I praise when positioning Dataflow but without the cost of running a batch streaming Dataflow job. Keep in mind that neither Google or Splunk support can assist in the operation of this method, however. You could find yourself "on your own" should things go wrong -- if you're building a production pipeline, skip this method. If you're having fun in the lab, go for it.

Conclusion

Whether you're just experimenting in a lab environment or building a production Dataflow pipeline, one of the methods described in this blog should have your cloud inventory ingest requirements covered. And once you've got that data into Splunk, you'll be generating dashboards and reports with the GCP Application Template in no time!

Resources

Matthew Hite

Matt Hite is Splunk's resident Google Cloud expert. When not at work, Matt can usually be found in his studio attempting to make electronic music or more likely curating his 80s music collection.

Partners 1 Min Read

Splunk Cloud to Launch on AWS Europe (Milan) Region

We're excited to announce the availability of Splunk Cloud on AWS Cloud Italy from 28th June 2024. Read on to learn more about what that means for your business.

Partners 2 Min Read

Enhance Your Security Posture with Splunk + Google Workspace

With the launch of the Splunk Add-On for Google Workspace, Splunk customers now have a Splunk-supported, high-quality option for the collection and preparation of critical audit events from their Google Workspace deployment.

Partners 8 Min Read

Smart, Secure and Sustainable Manufacturing - How Splunk and Google Cloud Are Helping Manufacturers to Skate Where the Puck is Going

Splunk and Google Cloud are helping organizations realize smart, secure and sustainable manufacturing. Manufacturers can now leverage new complementary solutions to stay ahead of the curve and thrive in the data age.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram

Follow @Splunk

See Splunk Perspectives blog for execs

Get Perspectives

Taking Inventory of Your Google Cloud

Note: Export vs. Feed

Pull

Method #1 - Pull From Bucket

Components

Pros

Cons

Setup

Push

Method #2 - Serverless push-to-Splunk

Components

Pros

Cons

Setup

Method #3 - Dataflow

Components

Pros

Cons

Setup

Recommendations

Conclusion

Resources

Related Articles

Splunk Cloud to Launch on AWS Europe (Milan) Region

Enhance Your Security Posture with Splunk + Google Workspace

Smart, Secure and Sustainable Manufacturing - How Splunk and Google Cloud Are Helping Manufacturers to Skate Where the Puck is Going

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram

See Splunk Perspectives blog for execs