2020-09-30 20:48:43 +02:00
|
|
|
# PrimaryOnlyService
|
|
|
|
|
|
|
|
The PrimaryOnlyService machinery provides a way to register tasks that should run only when current
|
|
|
|
node is Primary, and should be driven to completion across replica set failovers on the new
|
|
|
|
Primary. It is intended to be used by tasks that can be modeled as a state machine with a single
|
|
|
|
MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the
|
|
|
|
state of the task after failover and pick up where the old Primary left off.
|
|
|
|
|
|
|
|
## Classes
|
|
|
|
|
|
|
|
There are three main classes/interfaces that make up the PrimaryOnlyService machinery.
|
|
|
|
|
|
|
|
### PrimaryOnlyServiceRegistry
|
|
|
|
|
|
|
|
The PrimaryOnlyServiceRegistry is a singleton that is installed as a decoration on the
|
2024-02-27 20:47:14 +01:00
|
|
|
ServiceContext at startup and lives for the lifetime of the mongod process. During mongod global
|
2020-09-30 20:48:43 +02:00
|
|
|
startup, all PrimaryOnlyServices must be registered against the PrimaryOnlyServiceRegistry before
|
|
|
|
the ReplicationCoordinator is started up (as it is the ReplicationCoordinator startup that starts up
|
|
|
|
the registered PrimaryOnlyServices). Specific PrimaryOnlyServices can be looked up from the registry
|
|
|
|
at runtime, and are handed out by raw pointer, which is safe since the set of registered
|
2024-02-27 20:47:14 +01:00
|
|
|
PrimaryOnlyServices does not change during runtime. The PrimaryOnlyServiceRegistry is itself a
|
2020-09-30 20:48:43 +02:00
|
|
|
[ReplicaSetAwareService](../src/mongo/db/repl/README.md#ReplicaSetAwareService-interface), which is
|
|
|
|
how it receives notifications about changes in and out of Primary state.
|
|
|
|
|
|
|
|
### PrimaryOnlyService
|
|
|
|
|
2024-02-27 20:47:14 +01:00
|
|
|
The PrimaryOnlyService interface is used to define a new Primary Only Service. A PrimaryOnlyService
|
2020-09-30 20:48:43 +02:00
|
|
|
is a grouping of tasks (Instances) that run only when the node is Primary and are resumed after
|
2024-02-27 20:47:14 +01:00
|
|
|
failover. Each PrimaryOnlyService must declare a unique, replicated collection (most likely in the
|
2020-09-30 20:48:43 +02:00
|
|
|
admin or config databases), where the state documents for all Instances of the service will be
|
2024-02-27 20:47:14 +01:00
|
|
|
persisted. At stepUp, each PrimaryOnlyService will create and launch Instance objects for each
|
2020-09-30 20:48:43 +02:00
|
|
|
document found in this collection. This is how PrimaryOnlyService tasks get resumed after failover.
|
|
|
|
|
|
|
|
### PrimaryOnlyService::Instance/TypedInstance
|
|
|
|
|
|
|
|
The PrimaryOnlyService::Instance interface is used to contain the state and core logic for running a
|
|
|
|
single task belonging to a PrimaryOnlyService. The Instance interface includes a "run()" virtual
|
|
|
|
method which is provided an executor which is used to run all work that is done on behalf of the
|
|
|
|
Instance. Implementations should not extend PrimaryOnlyService::Instance directly, instead they
|
|
|
|
should extend PrimaryOnlyService::TypedInstance, which allows individual Instances to be looked up
|
2024-02-27 20:47:14 +01:00
|
|
|
and returned as pointers to the proper Instance sub-type. The InstanceID for an Instance is the \_id
|
2020-09-30 20:48:43 +02:00
|
|
|
field of its state document.
|
|
|
|
|
|
|
|
## Defining a new PrimaryOnlyService
|
|
|
|
|
|
|
|
To define a new PrimaryOnlyService one must add corresponding subclasses of both PrimaryOnlyService
|
2024-02-27 20:47:14 +01:00
|
|
|
and PrimaryOnlyService::TypedInstance. The PrimaryOnlyService subclass just exists to specify what
|
2020-09-30 20:48:43 +02:00
|
|
|
collection state documents for this service are stored in, and to hand out corresponding Instances
|
2024-02-27 20:47:14 +01:00
|
|
|
of the proper type. Most of the work of a new PrimaryOnlyService will be implemented in the
|
2020-09-30 20:48:43 +02:00
|
|
|
PrimaryOnlyService::Instance subclass. PrimaryOnlyService::Instance subclasses will be responsible
|
|
|
|
for running the work they need to perform to complete their task, as well as for managing and
|
|
|
|
synchronizing their own in-memory and on-disk state. No part of the PrimaryOnlyService **machinery**
|
2024-02-27 20:47:14 +01:00
|
|
|
ever performs writes to the PrimaryOnlyService state document collections. All writes to a given
|
2020-09-30 20:48:43 +02:00
|
|
|
Instance's state document (including creating it initially and deleting it when the work has been
|
2024-02-27 20:47:14 +01:00
|
|
|
completed) are performed by Instance implementations. This means that for the majority of
|
2020-09-30 20:48:43 +02:00
|
|
|
PrimaryOnlyServices, the first step of its Instance's run() method will be to insert an initial
|
|
|
|
state document into the state document collection, to ensure that the Instance is now persisted and
|
2024-02-27 20:47:14 +01:00
|
|
|
will be resumed after failover. When an Instance is resumed after failover, it is provided the
|
|
|
|
current version of the state document as it exists in the state document collection. That document
|
2020-09-30 20:48:43 +02:00
|
|
|
can be used to rebuild the in-memory state for this Instance so that when run() is called it knows
|
|
|
|
what state it is in and thus what work still needs to be performed, and what work has already been
|
|
|
|
completed by the previous Primary.
|
|
|
|
|
|
|
|
To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the
|
|
|
|
TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp
|
|
|
|
|
|
|
|
## Behavior during state transitions
|
|
|
|
|
|
|
|
At stepUp, each PrimaryOnlyService queries its state document collection, and for each document
|
|
|
|
found, creates and launches a PrimaryOnlyService::Instance initialized off of the state
|
|
|
|
document. This happens asynchronously relative to the core replication stepUp process - there is no
|
|
|
|
guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have
|
|
|
|
finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads
|
|
|
|
running their work are not joined, and the Instance objects containing their in-memory state are not
|
|
|
|
released, until the next stepUp. This is done to reduce the likelihood of blocking within the state
|
|
|
|
transition process and delaying it for the entire node. This behavior does, however, guarantee that
|
|
|
|
there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at
|
|
|
|
the same time on the same node.
|
|
|
|
|
|
|
|
### Interrupting Instances at stepDown
|
|
|
|
|
|
|
|
At stepDown, there are 3 main ways that Instances are interrupted and we guarantee that no more work
|
2024-02-27 20:47:14 +01:00
|
|
|
is performed on behalf of any PrimaryOnlyServices. The first is that the executor provided to each
|
2020-09-30 20:48:43 +02:00
|
|
|
Instance's run() method gets shut down, preventing any more work from being scheduled on behalf of
|
2024-02-27 20:47:14 +01:00
|
|
|
that Instance. The second is that all OperationContexts created on threads (Clients) that are part
|
2020-09-30 20:48:43 +02:00
|
|
|
of an Executor owned by a PrimaryOnlyService get interrupted. The third is that each individual
|
|
|
|
Instance is explicitly interrupted, so that it can unblock any work running on threads that are
|
2024-02-27 20:47:14 +01:00
|
|
|
_not_ a part of an executor owned by the PrimaryOnlyService that are dependent on that Instance
|
2020-09-30 20:48:43 +02:00
|
|
|
signaling them (e.g. commands that are waiting on the Instance to reach a certain state). Currently
|
|
|
|
this happens via a call to an interrupt() method that each Instance must override, but in the future
|
2021-03-19 19:28:46 +01:00
|
|
|
this is likely to change to signaling a CancellationToken owned by the Instance instead.
|
2020-09-30 20:48:43 +02:00
|
|
|
|
|
|
|
## Instance lifetime
|
|
|
|
|
|
|
|
Instances are held by shared_ptr in their parent PrimaryOnlyService. Each PrimaryOnlyService
|
2024-02-27 20:47:14 +01:00
|
|
|
releases all Instance shared_ptrs it owns on stepDown. Additionally, a PrimaryOnlyService will
|
2020-09-30 20:48:43 +02:00
|
|
|
release an Instance shared_ptr when the state document for that Instance is deleted (via an
|
2024-02-27 20:47:14 +01:00
|
|
|
OpObserver). Since generally speaking it is logic from an Instance's run() method that will be
|
2020-09-30 20:48:43 +02:00
|
|
|
responsible for deleting its state document, such logic needs to be careful as the moment the state
|
|
|
|
document is deleted, the corresponding PrimaryOnlyService is no longer keeping that Instance alive.
|
|
|
|
If an Instance has any additional logic or internal state to update after deleting its state
|
|
|
|
document, it must extend its own lifetime by capturing a shared_ptr to itself by calling
|
2023-04-26 19:50:58 +02:00
|
|
|
shared_from_this() before deleting its state document.
|