Tracing is one of the hottest topics in observability and monitoring. There are myriad open-source projects, new startups, and entire conferences dedicated to it. But here’s the truth: While everyone building and maintaining software has heard of tracing, only a select few engineers are using it. In most cases, engineering effort and business resources would be better spent on traditional logging and metrics.
RELATED CONTENT: The APM Buyers Guide
So why is tracing a trending topic? One of the main reasons is vanity: showing transactions between services and visualizing them makes for good conference presentations and sales demos. At the core though, what people really want when they discuss tracing is context propagation: the ability to correlate the same request between the backend and frontend, between multiple services. I’m not saying this isn’t a valuable part of Tracing; I’m saying you can get one without the other.
To do Tracing properly requires creating a whole new observability stack. This new stack will easily be 10x more expensive than your existing logging and metrics solutions. It’s much more efficient to have a logline and throw in some key identifiers, than to format your data completely differently and make things exponentially more complex.
Developers should focus on getting more value out of metrics and logs, instead of worrying about tracing just because “the cool kids are doing it.” According to the recent report published by logz.io, 66% of respondents don’t do tracing at all while 73% are still using log management and analysis to gain observability.
By the cool kids, of course, I mean companies like Uber and Pinterest. Similar to Kubernetes and serverless, these are all interesting technologies — but we are losing our ability to contextualize them. Not all companies have the resources of Google or Facebook. Not all companies have the same scale or face the same challenges. Some of the most popular websites in the world are still built using PHP and WordPress. If you want to be successful, it’s important to understand what your company actually needs to deliver a high-quality, reliable service. The problems being solved with tracing inside Pinterest and Uber… the majority of companies simply don’t have or care little about. At Observability Practitioner’s Summit, Suman Karumuri, a senior staff engineer, shared how at Slack a small team is the main consumer for tracing data. For most businesses, the questions are more straightforward, e.g. are 99% of my customers getting high-quality service? Which HTTP endpoints and requests are failing to meet performance targets? These kinds of questions can easily be answered by logs and metrics.
I’ll conclude by saying that most teams will get better ROI by effectively leveraging traditional logs and metrics over embracing a completely new and complex tracing approach. Simple improvements, such as making it easy to add and remove observability into your software and clearly defining what services need better observability in the first place, will make a world of difference.
The famous Google SRE book shows how they scaled their company focusing first and foremost on using a metrics aggregation solution known as borgmon (a Prometheus predecessor). If Google built a global spanning, highly-reliable infrastructure using nothing but traditional metrics — I’m betting you can too.