Distributed Tracing : What is the “cost” of instrumentation ?

Pratham
3 min readApr 17, 2021
Photo by Michael Longmire on Unsplash

Distributed Tracing

Ok, what is hell is “distributed tracing” ?

Microservices Architecture is on the rise and is extensively used to power applications and services that we use on a daily basis. Netflix, Amazon, eBay, to name a few, are based on microservice architecture.

With micro-service architecture a user request will typically span multiple services across different servers before stitching the response and sending it back to the user. The problem with this is monitoring, debug-ability, reduced global visibility.

Distributed tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. It refers to methods of observing requests as they propagate through distributed systems. It’s a diagnostic technique that reveals how a set of services coordinate to handle individual user requests. Distributed tracing requires that software developers add instrumentation to the code of an application.

OpenTracing provides API specification allowing to add instrumentation to the application code in a vendor neutral manner.

Cost Of Instrumentation

Services would usually talk to each other through some sort of IPC. There are many frameworks to allow such communication. Many frameworks provide inbuilt support for instrumentation, making it simple to enable distributed tracing. These framework usage and performance are studied extensively. The OpenTracing Blog is probably a good place to start.

So now let’s come to the significant question “what is the cost of instrumentation” ?
What is the performance impact of adding instrumentation to the application code ?

For measuring the cost we will use the jaeger tracer and JMH to capture the metrics.

JMH provides an API to consume cpu cycles varying linearly with token value specified. We will use this to mock a long running job which will be instrumented.

// We need to consume or return the result to avoid JVM dead code
// optimization
public long processLongJob(long token) {
Blackhole.consumeCPU(token);
return token;
}

We will measure the metrics without any instrumentation, with NoOpTracer and JaegerTracer with default initialization values.

Benchmark Numbers
Table 1 : Benchmark Numbers

As seen above No Instrumentation and NoOpTracer scores are comparable with virtually no impact on performance while JaegerTracer costs ~1.5x but decreases with increase in CPU cycles consumed.

Lets look at the average time per fixed-CPU-cycles. As CPU cycles consumed varies linearly with the token value we can divide the time by the token. Below table summarizes the above table into scores for the different instrumentation techniques

Benchmark Summary
Table 2 : Benchmark Summary

Plotting the table with token on X Axis and the scores on Y axis it becomes clear that the more intensive the code being instrumented the lesser it has impact on performance.

Performance Graph

Using the above measurements instrumenting any function whose total execution time > 1 ms, the cost of instrumentation is negligible.

That’s all folks. Till next time.

--

--