At AWS re:Invent 2016, Splunk released several AWS Lambda blueprints to help you stream logs, events and alerts from more than 15 AWS services into Splunk to gain enhanced critical security and operational insights into your AWS infrastructure & applications. In this blog post, we’ll walk you through step-by-step how to use one of these AWS Lambda blueprints, the Lambda blueprint for CloudWatch Logs, to stream AWS CloudWatch Logs via AWS Lambda and into Splunk for near real-time analysis and visualization as depicted in the diagram below. In the following example, we are interested in streaming VPC Flow logs which are stored in CloudWatch Logs. VPC Flow logs capture information about all the IP traffic going to and from network interfaces, and is therefore instrumental for security analysis and troubleshooting. With that said, the following mechanism applies to any logs stored in CloudWatch Logs.
Outline of this guide:
Splunk supports numerous ways to get data in, from monitoring local files or streaming wire data, to pulling data from remote 3rd-party APIs, to receiving data over syslog, tcp/udp, or http.
One example of pulling data from remote sources is the widely popular Splunk Add-on for AWS which reliably collects data from various AWS services.
One example of pushing data is via AWS Lambda function which is used to stream events over HTTPS to Splunk HTTP Event Collector (HEC).
These two pull and push models apply to different use cases and have different considerations. This post pertains to the push model which is particularly applicable for microservice architectures and event-driven computing such as AWS Lambda. Since there are no dedicated pollers to manage and orchestrate, the ‘push’ model generally offers the following benefits:
The following guide uses VPC Flow logs as an example CloudWatch log stream. If you already have a CloudWatch log stream from VPC Flow logs or other sources, you can skip to step 2, replacing VPC Flow logs references with your specific data type.
1a. Create a Flow Logs role to give permissions to VPC Flow Logs service to publish logs into CloudWatch Logs. Go ahead and create a new IAM role with the following IAM policy attached:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
Take note of the role name, say vpcFlowLogsRole, as you’ll need it in subsequent step.
You’ll also need to set a trust relationship on this role to allow the flow logs service to assume this role. Click on ‘Edit Trust Relationship’ under ‘Trust Relationships’ tab of the newly created role, delete any existing policy then paste the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "vpc-flow-logs.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
1b. Enable Flow Logs on your VPCs() from the AWS VPC Console as described in AWS VPC docs. For the rest of this guide, let’s say you specified vpcFlowLogs as the destination CloudWatch Logs group, which we’ll reference in a subsequent step. Within a few minutes, you should start seeing flow logs records in CloudWatch Logs console under that log group.
Now that you have flow logs being recorded, we’ll start setting up the data pipeline from the end, that is Splunk, working our way backward.
2a. Install Splunk Add-on for AWS. Note that since we’ll be using Splunk HEC, we will *not* be relying on any modular input from the Add-on to collect from CloudWatch Logs or VPC Flow Logs. However, we will leverage the data parsing logic (i.e. sourcetypes) that already exist in the Add-on to automatically parse the VPC Flow logs records and extract the fields.
2b. Create an HEC token from Splunk Enterprise. Refer to Splunk HEC docs for detailed instructions.
When configuring the input settings, make sure to specify “aws:cloudwatchlogs:vpcflow” as sourcetype. This is important to enable automatic fields extractions. Make sure to take note of your new HEC token value.
Note: For Splunk Cloud deployments, HEC must be enabled by Splunk Support.
Here’s how the data input settings would look like:
The pipeline stage prior to Splunk HEC is AWS Lambda. It will be execute by CloudWatch Logs whenever there are logs in a group, and stream these records to Splunk. Luckily, there’s already a Lambda blueprint published by Splunk for exactly that purpose.
3a. Create Lambda function using the “CloudWatch Logs to Splunk” Lambda blueprint from AWS console by clicking here. Alternatively, you can navigate to AWS Lambda console, click ‘Create a Lambda function’, then search for ‘splunk’ under ‘Select blueprint’. At that point you can select splunk-cloudwatch-logs-processor Lambda blueprint.
3b. Configure Lambda function trigger. Select ‘CloudWatch Logs’ as trigger if it’s not already selected. Then specify vpcFlowLogs as the log group. Enter a name for ‘Filter Name’, say vpcFlowLogsFilter. You can optionally enter a value for ‘Filter pattern’ if you want to restrict what gets delivered to Lambda. Before clicking ‘Next’, make sure ‘Enable trigger’ is checked. This is an example of how this form would look like:
This is also known as a CloudWatch Logs subscription filter which effectively creates a real-time feed of logs events from the chosen log group, in this case vpcFlowLogs.
Note that, when adding this Lambda trigger from the AWS Console, Lambda will add the required permissions for CloudWatch Logs service to invoke this particular Lambda function.
3c. Configure Lambda function. The function already implements the necessary logic to process the CloudWatch Logs data, including decoding it and decompressing it, and breaking the events before sending to Splunk HEC. You’ll need to set the following required parameters:
Note that AWS Lambda encrypts the environment variables at rest using a Lambda service key, by default. Environments variables are decrypted automatically by AWS Lambda when the function is invoked. While not required for the purpose of this set up, you also have the option to encrypt the environment variables before deploying the Lambda function. For more information, see Create a Lambda function using Environment Variables to Store Sensitive Information.
At this point, you can click ‘Next’ after reviewing your Lambda configuration which should look as follows:
After few minutes, you should start seeing events in Splunk Enterprise.
You can search by sourcetype
sourcetype="aws:cloudwatchlogs:vpcflow"
Or by source which is set by Lambda function to a default value of “lambda:<functionName>”:
source="lambda:vpcFlowLogsProcessor"
By using Lambda-based data ingestion, not only you can benefit from the simple setup above, but you can also leverage the advanced dashboards & sophisticated traffic & security analysis of VPC flow logs that come with Splunk App for AWS. If you set the correct sourcetype, for example “aws:cloudwatchlogs:vpcflow” in the case of VPC Flow logs as shown above, then you should see relevant dashboards populate automatically. Once installed, navigate to Splunk App for AWS, and view “VPC Flow Logs: Traffic Analysis” dashboard under Traffic & Access dropdown menu and “VPC Flow Logs: Security Analysis” dashboard under Security dropdown menu:
If you’re not seeing events in Splunk, you can troubleshoot this one pipeline stage at a time following the data flow direction:
We’ve shown you how you can configure a low-overhead & highly scalable data pipeline to stream your valuable CloudWatch Logs into your existing Splunk Enterprise by leveraging AWS Lambda & Splunk HEC together. That data pipeline enables near real-time processing & analysis of data by Splunk Enterprise.
As an example of CloudWatch Logs, we used VPC Flow logs that are stored in CloudWatch. That data is critical to understand the traffic in a VPC and any security considerations. However, note that VPC flow logs are themselves captured every few minutes, so the analysis of VPC Flow logs can only be done in batches.
Click here to get started with Lambda blueprints for Splunk directly from your AWS Console. We look forward to see how you’ll leverage the power of AWS Lambda & Splunk HEC to build your own serverless architectures and data pipelines. Leave us a note below with any feedback or comment, or on Splunk Answers for any question you may have.
----------------------------------------------------
Thanks!
Roy Arsan
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.