We’re all attuned to the potential business impact of downtime, so we’re grateful that Splunk Observability helps us be proactive about reliability and resilience with end-to-end visibility into our environment.
To keep up with skyrocketing demand for its delivery services during the pandemic, Rappi needed scalable, powerful observability tools to ensure customers could place and receive orders quickly and reliably.
With the Splunk platform, Rappi meets high shopper expectations for smooth ordering via mobile apps and websites, enabling fast delivery of local goods and services to doorsteps in nine countries.
The answer is simple: Rappi. In more than 250 cities throughout Latin America, Rappi offers consumers on-demand delivery of restaurant meals, groceries and other goods and services from local merchants. Founded in 2015, the company has expanded into travel and financial services as it grows its business across Argentina, Brazil, Chile, Colombia, Costa Rica, Ecuador, Mexico, Peru and Uruguay.
Throughout the years, Rappi has risen to customers’ expectations for speed and convenience — from offering reliable performance for its mobile apps and website to delivering merchandise fast, often in under 30 minutes. But as orders skyrocketed 300% during the COVID-19 pandemic, maintaining both speed and availability became a mounting challenge for Rappi’s IT team.
Prior to 2019, Rappi’s AWS hosts numbered in the hundreds. As its environment grew, the DevOps team encountered serious problems with its legacy application performance monitoring software. Alerts took minutes to receive, and the legacy tool’s insufficient sampling methods made it difficult to pinpoint issues. Once identified, those problems could then take hours — or even longer — to fix.
To fuel growth and resilience, Rappi turned to Splunk Observability Cloud for a more robust approach to infrastructure monitoring and troubleshooting, application performance monitoring, real user monitoring and synthetic monitoring.
“The more complex our architecture became, the harder it was for us to detect problems,” says Alejandro Comisario, executive vice president of engineering at Rappi. But by switching to Splunk Observability Cloud, Rappi has end-to-end visibility into its distributed microservices-based architecture, which includes Amazon Elastic Container Service and Kubernetes clusters. With a more agile approach and real-time observability from Splunk, the Rappi IT team now efficiently manages more than 1,000 microservices, 6,000 hosts and 15,000 containers — all while slashing mean time to resolution (MTTR) by over 90%.
Increased demand breeds a higher expectation for reliable, resilient services across Rappi’s mobile app and infrastructure. Splunk Observability Cloud is a key component of Rappi’s success, helping the team see, understand and act on real-time data in one place.
“A single dashboard provides data for engineering, DevOps, site reliability engineering, SecOps, peer engineering and microservices, operations and business metrics,” Comisario says. “If something happens at Rappi, and we don’t see it on our Splunk dashboard, it’s actually not happening at all.”
We’re all attuned to the potential business impact of downtime, so we’re grateful that Splunk Observability helps us be proactive about reliability and resilience with end-to-end visibility into our environment.
As with any e-commerce company, Rappi’s most revered success metrics for business stakeholders revolve around conversion to orders. Splunk’s observability tools help the IT team support those goals and deliver a smooth purchase experience for Rappi’s 7.5 million weekly active users by quickly detecting problems with Rappi’s mobile app, infrastructure or backend services.
“Splunk Observability Cloud helps us make blazing-fast decisions,” says Comisario. Ensuring brisk web page loads and frictionless mobile app transactions has helped Rappi grow to process more than 8.8 million orders each month.
“If we notice the home screen for any of our key business verticals is taking more than two seconds to load, we get concerned,” says Jose Felipe Lopez, engineering manager for Rappi. “With Splunk Observability Cloud, our development team gains instant intelligence to support our goals of always offering customers outstanding services.”
Fixing issues more than 90% faster was a major achievement for Rappi’s IT team members — but they didn’t stop there. To further improve uptime and performance, Rappi turned to Splunk On-Call, which sends metadata-rich issue notifications to any mobile device. With automations for scheduling and escalations, Splunk On-Call helps the right members of Rappi’s incident response team get notified of any problems right away. Since Rappi responders can now view incident context and audit trails on their phones, they resolve issues even more quickly — before they impact the customer experience or company revenues.
Rappi’s mobile app development team strives to release new app versions every two weeks via a phased rollout. As each phase launches, Lopez’s team keeps an eye out for issues, watching for any spikes in app crash rates. “If our app starts crashing for a lot of users, it can affect our order volume, revenue, NPS ratings and customer retention,” says Lopez. “We’re all attuned to the potential business impact of app errors or poor performance, so we’re grateful that Splunk Observability Cloud helps us be proactive about reliability and resilience with end-to-end, real-time visibility into our environment.”
The Rappi team experienced this firsthand when Splunk On-Call helped them fix a crash that inconveniently arose over a weekend. Upon receiving a notification, the team member investigated on their phone, confirming that a spike in crash activity was impacting too many Rappi users. By drilling into the data, they were able to identify an empty code string affecting the app. With a quick fix to an API, the employee immediately resolved the problem without updating the app — and orders began to flow smoothly again through the rest of the weekend.