0
0
mirror of https://github.com/mongodb/mongo.git synced 2024-11-27 23:27:11 +01:00
mongodb/docs/testing/otel_resmoke.md
Alexander Neben 71f830ce49 SERVER-87034 Initial markdown format (#19276)
GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00
2024-02-27 19:58:04 +00:00

2.8 KiB

Open telemetry (OTel) in resmoke

OTel is one of two systems we use to capture metrics from resmoke. For mongo-tooling-metrics please see the documentation here.

What Do We Capture

Using OTel we capture the following things

  1. How long a resmoke suite takes to run (a collection of js tests)
  2. How long each test in a suite takes to run (a single js test)
  3. Duration of hooks before and after test/suite
  4. Resmoke archiver (when there is a failure we archive core dumps)

To see this visually navigate to the resmoke dataset and view a recent trace.

A look at source code

Configuration

The bulk of configuration is done in the _set_up_tracing(...) method in configure_resmoke.py#L164. This method includes documentation on how it works.

BatchedBaggageSpanProcessor

See documentation batched_baggage_span_processor.py#L8

FileSpanExporter

See documentation file_span_exporter.py#L16

Capturing Data

We mostly capture data by using a decorator on methods. Example taken from job.py#L200

TRACER = trace.get_tracer("resmoke")

@TRACER.start_as_current_span("func_name")
def func_name(...):
    span = trace.get_current_span()
    span.set_attribute("attr1", True)

This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the with clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from job.py#L215

with TRACER.start_as_current_span("func_name", attributes={}):
    func_name(...)
    ...

Insights We Have Made (so far)

Using this dashboard and this query we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings.