This summer, the Splunk4Good team hosted their second annual Splunktern classes, educating interns on our product and applying Splunk to open data sources. Participating Splunkterns formed teams, identified social issues, and then used Splunk to promote change for social good. Final projects were presented to a panel of executives and the winning team was awarded a $1,000 charitable donation for their project’s cause.
In this series, we’ll hear how Splunkterns selected their project topics and how this impacted their intern experience at Splunk. Stay tuned to learn who was crowned the winning team.
This first post comes from Priyanka Nayak (Software Engineer intern), Anisha Dangoria (Financial Planning & Analysis intern), and Yiran Jia (Software Engineer intern).
When thinking about a topic for our project, the three of us knew that there was one thing that we thought could be improved: the Bay Area public transit system. Public transportation is an important asset for cities to have, especially those that have dense populations and high business development like San Francisco. The efficiency public transportation can provide to commuters can lead to a high social and economic impact in the Bay Area, which is why we wanted to analyze this topic further for our project.
We leveraged Splunk capabilities to understand current traveler patterns for targeted spending, illustrate the correspondence between GoBikes and BART, and extrapolate broader trends in commuting growth to forecast future BART ridership. Our research showed that there is a 112% increase in SF super commuters—people whose daily commute time exceeds 90 minutes—in the last decade. With over 265,000 people commuting into SF each day for work, we believe that scaling public transportation will greatly improve the commuter experience and reduce pollution in the future.
We first employed Splunk visualization and statistics features to evaluate the three busiest stations (Embarcadero, Powell, and Montgomery) and how trip patterns vary by time of day. Then, we related current BART ridership to bike share usage by computing the hourly Bike-to-BART ratio for each station. Shown is the plot of the ratio for Montgomery station from 5 AM to 11 PM, denoting the number of passengers using a Ford Bike at Montgomery divided by the total number of people getting off the Montgomery BART in that same hour.
From the data, we made three major observations:
A strong indication that bike share usage corresponds to BART ridership during peak transit hours
Possibly a high percentage of people transferring from bike to BART and vice versa for the daily commute
A difference in Bike-BART ratio among Bay Area stations
Based on these discoveries, we suggested that bike-share companies could increase the availability of bikes at peak times and several key locations, as well as allocate more marketing resources to those stations with a low Bike-to-BART ratio in order to raise awareness of their service. We also proceeded to predict future BART usage and discuss how this may impact the bike share industry, using linear regression to estimate how BART ridership is growing throughout the years and time series analysis to forecast how ridership will change in the future.
We also analyzed how Ford GoBike can allocate bikes accordingly to increase usage and profit. After aggregating the passenger-wise transit data into yearly station totals, we utilized the Splunk Machine Learning Toolkit to identify a general trend of ridership growth.
To more accurately model and predict BART ridership, we used a time series analysis, which takes into account cycles, trends, seasonality, and random noise. Our time series model matches the weekly ups and downs of commuters almost perfectly for each station, and from the plot, we could also discern subtle monthly patterns. Using this result, both BART and Ford GoBikes could prepare for future growth in ridership and respond to changes in real-time.
Overall, by optimizing the operation of our public transportation, our project could potentially help scale the bike-sharing market and support daily operations of the BART system. Given the powerful capabilities of Splunk indexing and search, our solution could be scaled across multiple levels. Most importantly however, doing this Splunk4Good project allowed us to see how society can benefit from public transportation, and how it can lead to a more sustainable future.
To learn more about Splunk’s commitment to research, education and community service, visit the Splunk Pledge website.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.