When monitoring your application performance or troubleshooting an issue in production, context is key. The more information available, the faster the prevention of or detection of a user impacting issue. Observability tools offer many different features, like code profiling, to help contextualize your data. In this post, I’ll discuss what code profiling is and show an example of how it works.
Code profiling provides engineers with code level visibility into resource bottlenecks to help troubleshoot service performance issues. Engineers can continuously measure how their code impacts CPU and memory usage and leads to slow service performance. Before delving further into code profiling, let's define a few terms:
Code profiling collects call stacks from production environments, and an agent sends periodic snapshots of call stacks via collector to the APM backend. The APM solution then visualizes code performance through charts or flame graphs, helping engineers understand if their code is performing poorly.
Splunk APM offers code profiling capabilities via the AlwaysOn Profiling® tool and this year at AWS re:Invent, attendees were able to participate in an AWS GameDay session that walked them through identifying and resolving a performance impacting code issue using Splunk APM and AlwaysOn Profiling.
GameDay teams were presented with a Java web app that had a hidden long running call. By implementing Splunk APM and configuring the Splunk AlwaysOn Profiling tool, teams were able to identify the code issue down to the specific file and line number within that file.
During the session, Splunk APM was quickly setup by teams following the guided walkthrough available within the Splunk Observability Cloud UI. Once players had instrumented their application to send data to Splunk APM using OpenTelemetry, teams were able to view application metrics, service maps, and business workflows within minutes.
For additional troubleshooting information, teams implemented AlwaysOn Profiling with a simple update to the command line used to launch their application. After AlwaysOn Profiling was enabled, call stacks for the application services were available within Splunk APM.
Teams were then able to troubleshoot and identify the long running call impacting application performance. AlwaysOn Profiling displayed the call stack information to the teams in both table and flame graph format. The call stack provided information about the individual methods and calls by the code within the desired service flow.
In the table on the left, we can see the name of the method executed, the amount of time spent executing that call, and how many times that call shows up in the call stack. Right away, teams were able to see that there was a long running sleep call. By selecting and expanding the sleep call, we can see one of the traces is taking significantly longer than the others.
When the longest running sleep call has been selected, teams were able to see the stack trace for this call displayed on the right. The stack trace showed that the long running call was occurring in the precheck method located in DoorChecker.java, located on line 36. After identifying the issue, teams were able to make code changes to update the application and fix the long running call.
Using Splunk APM with AlwaysOn Profiling enabled helped teams quickly identify, locate and resolve the issue. Application issues may not always be as simple as system downtime – slow performance can also lead to unhappy customers. The performance and code level insight gained from using APM and code profiling tools can lead to faster issue resolution and optimized application code.
AlwaysOn Profiling capabilities are currently available for Java, .NET and Node.js applications, with more languages coming soon. Automatic instrumentation without profiling support is also available for Go and several other languages. To experience the speed and power of code profiling for yourself, start a Splunk Observability Cloud free trial.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.