Containers have become the preferred unit of abstraction for developers to deploy application code at a time when speed and scale are requirements to drive competitive advantage. This is especially true in software-driven organizations building competitive advantage through innovative customer-facing applications linked to digital transformation initiatives. However, monitoring containerized applications is easier said than done.
We strive to ensure out of the box integrations with open-source container technologies including Docker, Kubernetes, as well as all the major public cloud container services – such as those offered by AWS, GCP, and Azure. In this blog post, we’ll elaborate on how you can use Splunk Infrastructure Monitoring to monitor Amazon ECS, for which we have some very exciting updates.
The highly dynamic and ephemeral nature of containerized environments makes them very hard to monitor using traditional infrastructure monitoring or APM solutions. With hundreds – if not thousands – of components being spun up and down every day or even hour, batch-based monitoring solutions cannot keep up with the churn and in many cases cannot even see the new components, let alone enabling multi-dimensional real-time analysis.
The Splunk Infrastructure Monitoring streaming analytics engine – SignalFlow – is the only one that can keep up with the dynamic nature of containerized environments without compromising performance and being overwhelmed by alert storms. Some of our customers are running the most demanding container based production environments in the industry, with millions of components being churned on a daily basis. Learn more about them here.
Amazon Elastic Container Service (ECS) is Amazon’s container orchestration service for Docker containers. ECS allows you to automate the deployment and scheduling of containers without having to install and operate your own container orchestration and cluster management software. ECS automatically deploys containers in EC2 hosts running in a customer’s VPC. It is integrated with several popular AWS services, including Amazon CloudWatch for monitoring.
You launch applications on an ECS cluster by defining, scheduling, and running tasks, with each task running up to 10 containers. Multiple copies of the same task can be simultaneously maintained with a service, which runs a specified number of task instances, and restarts those that become unhealthy or stop unexpectedly.
AWS CloudWatch provides metrics on CPU and memory utilization for your entire ECS cluster as well as on the services within your clusters. Additional system-level metrics are provided depending on whether you use AWS Fargate to launch your services. Users not using Fargate own and monitor the EC2 instances that containers run on, so CPU and utilization metrics are available at the cluster, service, and task level.
Several options exist for monitoring Amazon ECS with Splunk.
For basic infrastructure and container monitoring, we provide the Amazon CloudWatch integration. AWS CloudWatch collect standard metrics on containers and other ECS constructs:
Splunk Infrastructure Monitoring will ingest data through the Cloudwatch Integration and automatically populate built-in dashboards and the Infrastructure Navigator to quickly visualize metrics. In most cases, CloudWatch is configured by default to aggregate and report metrics at 5-minute resolution (with a few specific services reporting at 1-minute). Visualizations and alerts based on this data will update accordingly.
For greater insight into the services running in your containers, as well as finer-grained monitoring of container metrics, you can now use the Smart Agent, our open-source metrics agent with seamless installation and dynamic configuration of metrics. This is a new option that we just recently made available. Since its introduction, the Smart Agent has become a very popular option for sending data to Splunk Infrastructure Monitoring because it greatly reduces the time spent setting up metrics plug-ins by providing automatic service discovery and configuration for monitoring content. We are excited to now make it available also for ECS deployments!
As opposed to populating dashboards only for Amazon ECS, the Smart Agent automatically detects containers in your environment and the applications running in them, then generates the relevant service dashboards based on what was discovered.
The Smart Agent is also capable of submitting metrics to Splunk at 1-second resolution, which makes it extremely useful for monitoring environments with a high degree of churn (i.e. your containers constantly appear and disappear for seconds at a time due to auto scaling or load balancing).
Pre-built dashboards are generated that measure metrics for each individual ECS cluster and the services running in it, as well as your ECS deployment as a whole. Below are a few examples.
For an overall view of your ECS deployment, Splunk Infrastructure Monitoring provides charts displaying the number of active clusters, services and tasks. Charts also appear that highlight top clusters and services by resource utilization.
Each minute, ECS container agent calculates the % of CPU and memory that is currently being used for each task running on the container instance. Splunk Infrastructure Monitoring provides a ECS cluster dashboard that shows not only CPU and memory utilization for an entire cluster, but also by specific services with the highest resource utilization. A count of running services and tasks is provided as well.
You can also drill down to a particular ECS service to see the number of tasks being run as well as CPU and memory utilization.
Our Smart Agent will automatically discover containers deployed via ECS, then provide a heatmap view showing your entire fleet of containers or compute instances, with dashboards displaying aggregated system metrics across your environment. You can also select an individual container to see the specific applications running on it. It’s important to note as well that Smart Agent supports all ECS container networking modes: host, bridge and awsvpc.
When the Smart Agent is installed on your compute instances, a heatmap view is generated that sorts instances according to a variety of system metrics (cpu/memory/disk utilization, network traffic, disk operations). This view can easily be adjusted to group instances by dimensions such as AWS availability zone or by specific services that they support.
Active containers are automatically detected by the Smart Agent, allowing Splunk Infrastructure Monitoring to populate dashboards with individual container metrics.
Because Smart Agent also detects the services running in each container, dashboards are automatically populated with relevant metrics. In this example, the Smart Agent is able to detect a Docker container running Elasticsearch, and generates content for each.
Splunk Infrastructure Monitoring has been built from the ground up to meet the requirements of modern container-based applications and environments. At the core of our solution is the most scalable high-performance analytics solution for monitoring across the entire stack. Splunk helps you see the behavior of your container fleet, the health of the services they power, and the metrics on the applications inside your containers. If you’re not already using Splunk Infrastructure Monitoring, get started with a 14-day trial.
Thanks,
Aaron Sun and Alberto Farronato
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.