GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00
2.8 KiB
Open telemetry (OTel) in resmoke
OTel is one of two systems we use to capture metrics from resmoke. For mongo-tooling-metrics please see the documentation here.
What Do We Capture
Using OTel we capture the following things
- How long a resmoke suite takes to run (a collection of js tests)
- How long each test in a suite takes to run (a single js test)
- Duration of hooks before and after test/suite
- Resmoke archiver (when there is a failure we archive core dumps)
To see this visually navigate to the resmoke dataset and view a recent trace.
A look at source code
Configuration
The bulk of configuration is done in the
_set_up_tracing(...)
method in configure_resmoke.py#L164. This method includes documentation on how it works.
BatchedBaggageSpanProcessor
See documentation batched_baggage_span_processor.py#L8
FileSpanExporter
See documentation file_span_exporter.py#L16
Capturing Data
We mostly capture data by using a decorator on methods. Example taken from job.py#L200
TRACER = trace.get_tracer("resmoke")
@TRACER.start_as_current_span("func_name")
def func_name(...):
span = trace.get_current_span()
span.set_attribute("attr1", True)
This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the with
clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from job.py#L215
with TRACER.start_as_current_span("func_name", attributes={}):
func_name(...)
...
Insights We Have Made (so far)
Using this dashboard and this query we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings.