The following is a guest blog post from Iman Roodbaei, Senior Cloud Operational Engineer at General Electric, and Vijay Kota, Splunk Consultant for General Electric.
Please use this blog post in conjunction with the "Configure CloudWatch inputs for the Splunk Add-on for AWS" documentation for any additional reference.
Why Lambda & Splunk?
As you can see, we've created the S3 Bucket (cloudwatchmetrics) which uses Notification (SNS) & Query subscription (SQS).
We set up a dead-letter queue for the SQS queue to be used for the input for storing invalid messages. For information about SQS dead-letter queues and how to configure it, see this AWS documentation.
We also configured the SQS visibility timeout to prevent multiple inputs from receiving and processing messages in a queue more than once; we recommend setting the SQS visibility timeout to 5 minutes from now or longer.
If the visibility timeout for a message is reached before the message has been fully processed by the SQS-based S3 input, then the message will re-appear in the queue to be retrieved and processed again. In that case, we need to assure it's not resulting in duplicate data!
Want more information about SQS visibility timeout and how to configure it? Check out the AWS documentation, "What is Amazon Simple Queue Service?".
Function policy:
{
"Version": "2012-10-17",
"Id": "default",
"Statement": [{
"Sid": "lambda-58e7749c-51e2-40ab-b1de-beb567661f8d",
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:us-west2:1234567890123:function:GatherMetricsAndPostIntoSplunk",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:events:us-west2:1234567890123:rule/ScheduleLambdaCloudWatchMetrics"
}
}
}]
}
Execution Role:
{
"roleName": "lambda_splunk_elb",
"policies": [{
"document": {
"Version": "2012-10-17",
"Statement": [{
"Sid": "LogThings",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Sid": "SQSThings",
"Effect": "Allow",
"Action": [
"sqs:ListQueues",
"sqs:GetQueue*"
],
"Resource": "arn:aws:sqs:*:1234567890123:*"
},
{
"Sid": "SNSThings",
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:*:1234567890123:*"
}
]
},
"name": "oneClick_lambda_basic_execution_1492625956312",
"type": "inline"
},
{
"document": {
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"cloudformation:DescribeChangeSet",
"cloudformation:DescribeStackResources",
"cloudformation:DescribeStacks",
"cloudformation:GetTemplate",
"cloudformation:ListStackResources",
"cloudwatch:*",
"cognito-identity:ListIdentityPools",
"cognito-sync:GetCognitoEvents",
"cognito-sync:SetCognitoEvents",
"dynamodb:*",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"events:*",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListRolePolicies",
"iam:ListRoles",
"iam:PassRole",
"iot:AttachPrincipalPolicy",
"iot:AttachThingPrincipal",
"iot:CreateKeysAndCertificate",
"iot:CreatePolicy",
"iot:CreateThing",
"iot:CreateTopicRule",
"iot:DescribeEndpoint",
"iot:GetTopicRule",
"iot:ListPolicies",
"iot:ListThings",
"iot:ListTopicRules",
"iot:ReplaceTopicRule",
"kinesis:DescribeStream",
"kinesis:ListStreams",
"kinesis:PutRecord",
"kms:ListAliases",
"lambda:*",
"logs:*",
"s3:*",
"sns:ListSubscriptions",
"sns:ListSubscriptionsByTopic",
"sns:ListTopics",
"sns:Publish",
"sns:Subscribe",
"sns:Unsubscribe",
"sqs:ListQueues",
"sqs:SendMessage",
"tag:GetResources",
"xray:PutTelemetryRecords",
"xray:PutTraceSegments"
],
"Resource": "*"
}]
},
"name": "AWSLambdaFullAccess",
"id": "ANPAI6E2CYYMI4XI7AA5K",
"type": "managed",
"arn": "arn:aws:iam::aws:policy/AWSLambdaFullAccess"
}
]
}
Modification in Splunk:
Add these line to props.conf:
[aws:cloudwatch:metrics]
SHOULD_LINEMERGE = False
pulldown_type = true
INDEXED_EXTRACTIONS = JSON
ADD_EXTRA_TIME_FIELDS = False
KV_MODE = none
TIMESTAMP_FIELDS = metric_timestamp
#TIME_FORMAT = %s.%Q
TIME_FORMAT = %s
category = Metrics
description = Comma-separated value format for metrics. Must have metric_timestamp, metric_name, and _value fields.
Current Metrics List:
namespace = 'AWS/EC2'
metric_list = ["CPUUtilization",
"DiskReadBytes",
"DiskWriteBytes",
"DiskReadOps",
"DiskWriteOps",
"NetworkOut",
"NetworkIn",
"NetworkPacketsOut",
"NetworkPacketsIn",
"StatusCheckFailed",
"StatusCheckFailed_Instance",
"StatusCheckFailed_System",
"ProcessedBytes": "Bytes",
"NewFlowCount": "Count",
"ActiveFlowCount": "Count",
"TCP_Client_Reset_Count": "Count",
"TCP_Target_Reset_Count": "Count",
"TCP_ELB_Reset_Count": "Count",
"ConsumedLCUs": "Count",
"HealthyHostCount": "Count",
"UnHealthyHostCount": "Count",
"RequestCount": "Count",
"HTTPCode_Target_5XX_Count": "Count",
"HTTPCode_Target_4XX_Count": "Count",
"HTTPCode_Target_2XX_Count": "Count",
"TargetResponseTime": "Seconds",
"TargetConnectionErrorCount": "Count",
"HTTPCode_ELB_4XX_Count": "Count",
"HTTPCode_ELB_5XX_Count": "Count",
"HTTPCode_ELB_2XX_Count": "Count",
]
Sample data that our lambda function is sending to S3:
Here are sample dashboards that were built based on the mstat query:
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.