I recently had a customer ask me how to calculate funnels in Splunk. His source data consisted of custom application logs, but this method will work with any logs that have a field representing a unique visitorID.
In this context, a “funnel” is a calculation that shows what percentage of visitors progressed through each step in a process, usually a purchase process. So, for example, a classic funnel would show how many people visited a site, clicked on a product page, added the item to their shopping cart, and then purchased the item.
In Splunk, of course, this is simple, as long as you are familiar with the appendcols function. The appendcols function allows you to “glue” two independent searches together into the same table of results. In the case of a funnel, you would use the first search:
sourcetype="funnel" event_type="page_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as total_visitors
And then you would “glue” the second search onto it using the appendcols command.
sourcetype="funnel" event_type="page_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as total_visitors | appendcols [search sourcetype="funnel" event_type="product_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as visitors_who_viewed_products] |
Note that I’ve set time parameters for each search in the query itself. It’s actually not necessary for the first search — the time parameters in the interface would handle that if I hadn’t explicitly set them in the query. However, the second search triggered by the colspan function does require time parameters to be set.
Now, You could continue the “gluing” until you’ve included every step in the process:
sourcetype="funnel" event_type="page_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as total_visitors | appendcols [search sourcetype="funnel" event_type="product_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_viewed_products] | appendcols [search sourcetype="funnel" event_type="add_to_cart" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as visitors_who_added_items_to_carts] | appendcols [search sourcetype="funnel" event_type="check_out" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_checked_out]
If you want the results to be percentages, add an “eval” for each value:
sourcetype="funnel" event_type="page_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as total_visitors | appendcols [search sourcetype="funnel" event_type="product_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_viewed_products] | appendcols [search sourcetype="funnel" event_type="add_to_cart" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_added_items_to_carts] | appendcols [search sourcetype="funnel" event_type="check_out" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as visitors_who_checked_out] | eval visitors_who_viewed_products = round((visitors_who_viewed_products/total_visitors) * 100) . "%" | eval visitors_who_added_items_to_carts = round((visitors_who_added_items_to_carts/total_visitors) * 100) . "%" | eval visitors_who_checked_out = round((visitors_who_checked_out/total_visitors) * 100) . "%"
And if you want to clean it up in a few other ways, play with the “fields” command (to set the order), “transpose” command (to turn it vertical), and “rename” command (to rename the newly created “row1” field to something pretty).
sourcetype="funnel" event_type="page_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as total_visitors | appendcols [search sourcetype="funnel" event_type="product_view" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_viewed_products] | appendcols [search sourcetype="funnel" event_type="add_to_cart" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00"| stats dc(VisitorID) as visitors_who_added_items_to_carts] | appendcols [search sourcetype="funnel" event_type="check_out" earliest="01/09/2014:23:00:00" latest="01/09/2014:24:00:00" | stats dc(VisitorID) as visitors_who_checked_out] | eval visitors_who_viewed_products = round((visitors_who_viewed_products/total_visitors) * 100) . "%" | eval visitors_who_added_items_to_carts = round((visitors_who_added_items_to_carts/total_visitors) * 100) . "%" | eval visitors_who_checked_out = round((visitors_who_checked_out/total_visitors) * 100) . "%" | fields total_visitors visitors_who_viewed_products visitors_who_added_items_to_carts visitors_who_checked_out | rename total_visitors AS "All Visitors" visitors_who_viewed_products AS "Viewed Products" visitors_who_added_items_to_carts AS "Added Items" visitors_who_checked_out AS "Checked Out" | transpose | rename "row 1" AS "Purchase Funnel"
My brilliant colleague Gilberto Castillo noted that there is a much more efficient way to do this, as long as your data allows you to define each step in your funnel based on a single field like “event_type” (If you wanna get really fancy, you could create a single field by using the eval function to concatenate the multiple fields into one field.)
His approach bypasses the use of the appendcols function entirely and just counts unique visitors by event_type.
sourcetype="funnel" event_type="*" | rex "(?<VisitorID>[0-9a-zA-Z]+)\s+event" | eventstats dc(VisitorID) AS all_visitors | search (event_type="product_view" OR event_type="add_to_cart" OR event_type="check_out") | stats values(all_visitors) AS all_visitors dc(VisitorID) AS dc_event_type by event_type | eval percentage=round(dc_event_type/all_visitors*100,2) | fields - all_visitors dc_event_type
Try it out with your own data, or use the sample data I used to create this dashboard. You’ll find it below.
Thanks!
-S.
SAMPLE DATA: funnel.log (index as sourcetype “funnel” and extract the VisitorID field with the regex “(?<VisitorID>[0-9a-zA-Z]+)\s+event”)
----------------------------------------------------
Thanks!
Sondra Russell
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.