This past summer, the Splunk for Good and the Splunk University Recruiting team hosted their third annual Splunktern classes, educating interns on our product and applying Splunk to open data sources. Participating Splunkterns formed teams, identified social issues, and then used Splunk to promote change for social good. Final projects were presented to a panel of executives and the winning team was awarded a $1,000 charitable donation for their project's cause.
In this series, we’ll hear how Splunkterns selected their project topics and how this impacted their intern experience at Splunk. Check out the first post "San Francisco Fire Safety” by Kathyhan Nguyen (Software Developer in Test Intern) and Julianne Li (Business Operations Intern).
When thinking about a topic for our project, we decided to focus in on the city of San Francisco because it is currently the second most densely populated city in the United States with 884,363 residents. With this in mind, we realized that any fire incidents amongst the clustered buildings and homes can cause a series of serious injuries and even deaths. So we decided to focus our project on spreading awareness on fire safety by using Splunk to prevent future fires.
Our first step was to search for SF open datasets that we believed to be useful. One of the primary datasets we utilized contained information on San Francisco fire incidents from 2003-2019, with details about where the fire occurred, how much damage it caused, the number of casualties and fatalities, and the fire department’s response. A second dataset contained a record of fire complaints received by the San Francisco Fire department. After inputting this data into Splunk and mapping it onto a shapefile that divided the city into blocks, we were able to get a general sense of which areas had a more extensive history of fire occurrences.
To make sense of the dataset we had, we used Splunk to create a fire risk safety score with different potential fire contributing factors such as location, injuries/fatalities, complaints, and medical support. The shapefile we found contained information on what year the buildings on each block were built. We used the Zillow API to obtain the median house value for each neighborhood in San Francisco, and took this factor into consideration in our calculation of a fire risk score. This score was ranked from one through five with five being the most severe fire incident and zero being no reports of fire incidents.
We then used Splunk’s Machine Learning Toolkit in an attempt to predict whether or not a bad fire would occur in a given area. A bad fire was defined to be any occurrence with a fire score of 3, 4, or 5. We used a logistic regression algorithm to train half of our data using the following factors: average year built, number of fire complaints, latitude, and longitude. Upon testing on the other half, we were able to predict the occurence of a bad fire with 67% accuracy. In order to better visualize these results, we mapped them onto the same shapefile.
As a result, we were able to map out the results from our fire risk safety score and the machine learning to every block in San Francisco. We set the existing fires from the San Francisco fire dataset in blue and predicted fires from Splunk MLTK in green. An interesting finding was that most of the predicted fires were reported to happen in the most social park of the city where restaurants, bars and people like to hang out in, shown in yellow on the map.
By creating a machine learning model using factors believed to contribute to a greater fire risk, our project could help inform people about which areas are more prone to fires and about their causes. Ideally, we want to equip the at-risk areas with knowledge about how to respond when fires occur, educating people about how to deal with fires in their homes or in public buildings. This project also greatly emphasized the need for more accurate and thorough data collection. With a more comprehensive dataset, we could greatly improve our machine learning model and predict the occurence of a fire with closer to 100% accuracy.
We hope to continue to make a greater impact on the community with this project. First, we hope to continue to help monitor and forecast fire occurrences especially in busy areas and find the causation factors for it. Second, hope to spread fire safety awareness and educate people on how to deal with fires in houses or buildings. Lastly, we hope to partner up with non-profits like Red Cross notify them on where to focus fire prone areas for their free fire detector installment initiative.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.