We are excited to announce the availability of the Splunk App for HashiCorp Vault. Using this app, organizations can seamlessly ingest and visualize performance metrics and audit logs in Splunk to investigate, monitor, analyze and act on Vault data across DevSecOps use cases.
Organizations are adopting microservices to accelerate code releases, scale efficiently and provide developer autonomy. While it is universally accepted that a microservice architecture brings agility and resiliency, it also introduces operational complexity due to the distributed nature of microservices. Unlike monoliths with unitary application logic, microservices spread the application logic across multiple services that need to securely communicate. Machine to machine (servers, virtual machines containers, applications, microservices, etc.) communication requires proper authentication using API keys, credentials or certificates. DevSecOps teams often grapple with how to consolidate secrets, avoid secret sprawl, securely use credentials in applications using ephemeral secrets, key management and implement identity across multiple clouds.
HashiCorp Vault is a widely used platform to secure, store and tightly control access to API keys, passwords, certificates and encryption keys for protecting sensitive data used in dynamic infrastructure and microservices. Vault centrally manages and securely stores secrets using a single central system, giving portability and consistent interface to manage application security (AppSec) across on-premises infrastructure and multiple clouds.
With the Splunk App for HashiCorp Vault, you can get insights on the following:
Vault Infrastructure: Vault performs infrastructure intensive operations. Offloading the burden of encryption/decryption from applications to Vault can be CPU-intensive, especially when done by several concurrent threads from multiple microservices. Vault can handle most demanding applications on a massive scale; however, it is vital to monitor performance metrics on CPU utilization and saturation as if the CPU is too busy, it may not be able to process new requests, thus impacting the end-user experience.
Keeping tabs on Disk IO and saturation is important, so you know that your storage infrastructure has adequate resources.
The same goes for monitoring Network metrics. Vault keeps a detailed log of all requests and responses. Because every operation with Vault is an API request/response, the audit log contains every interaction with Vault, including errors. Vault offers multiple audit devices, including a local file or logging over the network using the Socket audit device, Splunk HTTP Event Collector (HEC), for example. Vault operations will get impacted if it cannot write to the logging system. It is, therefore, critical to monitor the health of network performance. Splunk App gives visibility into network performance metrics for Vault clusters.
Vault Internals: Vault collects various runtime metrics about the performance of different libraries and subsystems. Splunk App uses this telemetry for getting a better view of what Vault is doing.
Lease metrics: For every secret, Vault creates a lease and metadata about the secret such as its time to live (TTL) and whether to renew it. Monitoring lease metrics gives visibility into how often secrets are getting used. A sudden spike in leases eligible to expire indicates an increase in traffic to the application. On the other hand, an unexpected drop could mean Vault can't keep up with the requests.
Leadership metrics: Vault supports high-availability with an active leader node and other standby nodes. If the leader node fails or becomes sealed, one of the standby nodes takes leadership and continues to process requests. It is important to keep track of leadership changes as frequent leadership changes can point to systematic failure or security event. For instance, the storage layer encounters an unrecoverable error, server restarts or unintended API calls to seal Vault node. Splunk App provides complete visibility into the Vault status – sealed or unsealed, and leadership changes and leadership setup failures.
Resource consumption metrics: Additionally, it is important to note how much resources are consumed by Vault such memory allocated, available and used by the system. Pay attention to time spent in garbage collection; tune memory availability as necessary to minimize GC pauses. Number of Goroutines indicate the load on the system and a sudden change in this metric can point to a system-wide issue.
Other operational metrics: Monitor core Vault metrics such as latency to list, get, put, delete secrets in/out of storage, login requests, checking tokens, ACL fetch, etc. as they add to the latency to fulfill Vault requests.
Vault provides an audit trail to see who has requested data to identify and prevent any unauthorized access to your data proactively.
Tokens are the core method for authentication within Vault. Splunk app provides you granular visibility into the number, lifespan, type of tokens and authorization methods. You can isolate tokens with long time-to-live (TTL), identify users or services with such tokens and advise DevOps teams with best practices of using ephemeral tokens with short TTL whenever possible.
Keep an eye on the usage of secrets by application services, batch processes, automation tooling and other Vault clients. Splunk App provides insights into how many secrets are in the Vault system, who is accessing them. You also get out of the box visibility into the number of requests by path, role name, entity id, and errors by path so you can proactively correct applications accessing invalid path
Getting started with Splunk App for HashiCorp Vault is straightforward. Install the app from Splunkbase for your Splunk Cloud or Splunk Enterprise environment. HashiCorp provides step-by-step instructions on how to configure data inputs and indexes to get data into Splunk. Additionally, the guide is full of practical recommendations on considering the right metrics to monitor, setting thresholds and estimating normal ranges. Read more about the Splunk App and see it in action.
Join Mark Gritter and Darshana Sivakumar from HashiCorp at the Splunk DevOps Insights and Innovation Virtual Event on July 22nd, 2020. In their session, they will cover Vault use cases and operationalizing Vault with Splunk App. Register here and save your spot. We look forward to seeing you there!
----------------------------------------------------
Thanks!
Amit Sharma
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.