The following is a guest blog post from Iman Roodbaei, Senior Cloud Operational Engineer at General Electric, and Vijay Kota, Splunk Consultant for General Electric.
Please use this blog post in conjunction with the "Configure CloudWatch inputs for the Splunk Add-on for AWS" documentation for any additional reference.
Why Lambda & Splunk?
As you can see, we've created the S3 Bucket (cloudwatchmetrics) which uses Notification (SNS) & Query subscription (SQS).
We set up a dead-letter queue for the SQS queue to be used for the input for storing invalid messages. For information about SQS dead-letter queues and how to configure it, see this AWS documentation.
We also configured the SQS visibility timeout to prevent multiple inputs from receiving and processing messages in a queue more than once; we recommend setting the SQS visibility timeout to 5 minutes from now or longer.
If the visibility timeout for a message is reached before the message has been fully processed by the SQS-based S3 input, then the message will re-appear in the queue to be retrieved and processed again. In that case, we need to assure it's not resulting in duplicate data!
Want more information about SQS visibility timeout and how to configure it? Check out the AWS documentation, "What is Amazon Simple Queue Service?".
Function policy:
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [{
"Sid": "lambda-58e7749c-51e2-40ab-b1de-beb567661f8d",
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:us-west2:1234567890123:function:GatherMetricsAndPostIntoSplunk",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:events:us-west2:1234567890123:rule/ScheduleLambdaCloudWatchMetrics"
}
}
}]
}
Execution Role:
{
"roleName": "lambda_splunk_elb",
"policies": [{
"document": {
"Version": "2012-10-17",
"Statement": [{
"Sid": "LogThings",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Sid": "SQSThings",
"Effect": "Allow",
"Action": [
"sqs:ListQueues",
"sqs:GetQueue*"
],
"Resource": "arn:aws:sqs:*:1234567890123:*"
},
{
"Sid": "SNSThings",
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:*:1234567890123:*"
}
]
},
"name": "oneClick_lambda_basic_execution_1492625956312",
"type": "inline"
},
{
"document": {
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"cloudformation:DescribeChangeSet",
"cloudformation:DescribeStackResources",
"cloudformation:DescribeStacks",
"cloudformation:GetTemplate",
"cloudformation:ListStackResources",
"cloudwatch:*",
"cognito-identity:ListIdentityPools",
"cognito-sync:GetCognitoEvents",
"cognito-sync:SetCognitoEvents",
"dynamodb:*",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"events:*",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListRolePolicies",
"iam:ListRoles",
"iam:PassRole",
"iot:AttachPrincipalPolicy",
"iot:AttachThingPrincipal",
"iot:CreateKeysAndCertificate",
"iot:CreatePolicy",
"iot:CreateThing",
"iot:CreateTopicRule",
"iot:DescribeEndpoint",
"iot:GetTopicRule",
"iot:ListPolicies",
"iot:ListThings",
"iot:ListTopicRules",
"iot:ReplaceTopicRule",
"kinesis:DescribeStream",
"kinesis:ListStreams",
"kinesis:PutRecord",
"kms:ListAliases",
"lambda:*",
"logs:*",
"s3:*",
"sns:ListSubscriptions",
"sns:ListSubscriptionsByTopic",
"sns:ListTopics",
"sns:Publish",
"sns:Subscribe",
"sns:Unsubscribe",
"sqs:ListQueues",
"sqs:SendMessage",
"tag:GetResources",
"xray:PutTelemetryRecords",
"xray:PutTraceSegments"
],
"Resource": "*"
}]
},
"name": "AWSLambdaFullAccess",
"id": "ANPAI6E2CYYMI4XI7AA5K",
"type": "managed",
"arn": "arn:aws:iam::aws:policy/AWSLambdaFullAccess"
}
]
}
Modification in Splunk:
Add these line to props.conf:
[aws:cloudwatch:metrics]
SHOULD_LINEMERGE = False
pulldown_type = true
INDEXED_EXTRACTIONS = JSON
ADD_EXTRA_TIME_FIELDS = False
KV_MODE = none
TIMESTAMP_FIELDS = metric_timestamp
#TIME_FORMAT = %s.%Q
TIME_FORMAT = %s
category = Metrics
description = Comma-separated value format for metrics. Must have metric_timestamp, metric_name, and _value fields.
Current Metrics List:
namespace = 'AWS/EC2'
metric_list = ["CPUUtilization",
"DiskReadBytes",
"DiskWriteBytes",
"DiskReadOps",
"DiskWriteOps",
"NetworkOut",
"NetworkIn",
"NetworkPacketsOut",
"NetworkPacketsIn",
"StatusCheckFailed",
"StatusCheckFailed_Instance",
"StatusCheckFailed_System",
"ProcessedBytes": "Bytes",
"NewFlowCount": "Count",
"ActiveFlowCount": "Count",
"TCP_Client_Reset_Count": "Count",
"TCP_Target_Reset_Count": "Count",
"TCP_ELB_Reset_Count": "Count",
"ConsumedLCUs": "Count",
"HealthyHostCount": "Count",
"UnHealthyHostCount": "Count",
"RequestCount": "Count",
"HTTPCode_Target_5XX_Count": "Count",
"HTTPCode_Target_4XX_Count": "Count",
"HTTPCode_Target_2XX_Count": "Count",
"TargetResponseTime": "Seconds",
"TargetConnectionErrorCount": "Count",
"HTTPCode_ELB_4XX_Count": "Count",
"HTTPCode_ELB_5XX_Count": "Count",
"HTTPCode_ELB_2XX_Count": "Count",
]
Sample data that our lambda function is sending to S3:
Here are sample dashboards that were built based on the mstat query:
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.