mongodb/docs/primary_only_service.md

# PrimaryOnlyService

The PrimaryOnlyService machinery provides a way to register tasks that should run only when current
node is Primary, and should be driven to completion across replica set failovers on the new
Primary. It is intended to be used by tasks that can be modeled as a state machine with a single
MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the
state of the task after failover and pick up where the old Primary left off.

## Classes

There are three main classes/interfaces that make up the PrimaryOnlyService machinery.

### PrimaryOnlyServiceRegistry

The PrimaryOnlyServiceRegistry is a singleton that is installed as a decoration on the
ServiceContext at startup and lives for the lifetime of the mongod process. During mongod global
startup, all PrimaryOnlyServices must be registered against the PrimaryOnlyServiceRegistry before
the ReplicationCoordinator is started up (as it is the ReplicationCoordinator startup that starts up
the registered PrimaryOnlyServices). Specific PrimaryOnlyServices can be looked up from the registry
at runtime, and are handed out by raw pointer, which is safe since the set of registered
PrimaryOnlyServices does not change during runtime. The PrimaryOnlyServiceRegistry is itself a
[ReplicaSetAwareService](../src/mongo/db/repl/README.md#ReplicaSetAwareService-interface), which is
how it receives notifications about changes in and out of Primary state.

### PrimaryOnlyService

The PrimaryOnlyService interface is used to define a new Primary Only Service. A PrimaryOnlyService
is a grouping of tasks (Instances) that run only when the node is Primary and are resumed after
failover. Each PrimaryOnlyService must declare a unique, replicated collection (most likely in the
admin or config databases), where the state documents for all Instances of the service will be
persisted. At stepUp, each PrimaryOnlyService will create and launch Instance objects for each
document found in this collection. This is how PrimaryOnlyService tasks get resumed after failover.

### PrimaryOnlyService::Instance/TypedInstance

The PrimaryOnlyService::Instance interface is used to contain the state and core logic for running a
single task belonging to a PrimaryOnlyService. The Instance interface includes a "run()" virtual
method which is provided an executor which is used to run all work that is done on behalf of the
Instance. Implementations should not extend PrimaryOnlyService::Instance directly, instead they
should extend PrimaryOnlyService::TypedInstance, which allows individual Instances to be looked up
and returned as pointers to the proper Instance sub-type. The InstanceID for an Instance is the \_id
field of its state document.

## Defining a new PrimaryOnlyService

To define a new PrimaryOnlyService one must add corresponding subclasses of both PrimaryOnlyService
and PrimaryOnlyService::TypedInstance. The PrimaryOnlyService subclass just exists to specify what
collection state documents for this service are stored in, and to hand out corresponding Instances
of the proper type. Most of the work of a new PrimaryOnlyService will be implemented in the
PrimaryOnlyService::Instance subclass. PrimaryOnlyService::Instance subclasses will be responsible
for running the work they need to perform to complete their task, as well as for managing and
synchronizing their own in-memory and on-disk state. No part of the PrimaryOnlyService **machinery**
ever performs writes to the PrimaryOnlyService state document collections. All writes to a given
Instance's state document (including creating it initially and deleting it when the work has been
completed) are performed by Instance implementations. This means that for the majority of
PrimaryOnlyServices, the first step of its Instance's run() method will be to insert an initial
state document into the state document collection, to ensure that the Instance is now persisted and
will be resumed after failover. When an Instance is resumed after failover, it is provided the
current version of the state document as it exists in the state document collection. That document
can be used to rebuild the in-memory state for this Instance so that when run() is called it knows
what state it is in and thus what work still needs to be performed, and what work has already been
completed by the previous Primary.

To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the
TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp

## Behavior during state transitions

At stepUp, each PrimaryOnlyService queries its state document collection, and for each document
found, creates and launches a PrimaryOnlyService::Instance initialized off of the state
document. This happens asynchronously relative to the core replication stepUp process - there is no
guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have
finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads
running their work are not joined, and the Instance objects containing their in-memory state are not
released, until the next stepUp. This is done to reduce the likelihood of blocking within the state
transition process and delaying it for the entire node. This behavior does, however, guarantee that
there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at
the same time on the same node.

### Interrupting Instances at stepDown

At stepDown, there are 3 main ways that Instances are interrupted and we guarantee that no more work
is performed on behalf of any PrimaryOnlyServices. The first is that the executor provided to each
Instance's run() method gets shut down, preventing any more work from being scheduled on behalf of
that Instance. The second is that all OperationContexts created on threads (Clients) that are part
of an Executor owned by a PrimaryOnlyService get interrupted. The third is that each individual
Instance is explicitly interrupted, so that it can unblock any work running on threads that are
_not_ a part of an executor owned by the PrimaryOnlyService that are dependent on that Instance
signaling them (e.g. commands that are waiting on the Instance to reach a certain state). Currently
this happens via a call to an interrupt() method that each Instance must override, but in the future
this is likely to change to signaling a CancellationToken owned by the Instance instead.

## Instance lifetime

Instances are held by shared_ptr in their parent PrimaryOnlyService. Each PrimaryOnlyService
releases all Instance shared_ptrs it owns on stepDown. Additionally, a PrimaryOnlyService will
release an Instance shared_ptr when the state document for that Instance is deleted (via an
OpObserver). Since generally speaking it is logic from an Instance's run() method that will be
responsible for deleting its state document, such logic needs to be careful as the moment the state
document is deleted, the corresponding PrimaryOnlyService is no longer keeping that Instance alive.
If an Instance has any additional logic or internal state to update after deleting its state
document, it must extend its own lifetime by capturing a shared_ptr to itself by calling
shared_from_this() before deleting its state document.
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`# PrimaryOnlyService`

			`The PrimaryOnlyService machinery provides a way to register tasks that should run only when current`
			`node is Primary, and should be driven to completion across replica set failovers on the new`
			`Primary. It is intended to be used by tasks that can be modeled as a state machine with a single`
			`MongoDB document containing the current state, which newly-elected Primaries can use to rebuild the`
			`state of the task after failover and pick up where the old Primary left off.`

			`## Classes`

			`There are three main classes/interfaces that make up the PrimaryOnlyService machinery.`

			`### PrimaryOnlyServiceRegistry`

			`The PrimaryOnlyServiceRegistry is a singleton that is installed as a decoration on the`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`ServiceContext at startup and lives for the lifetime of the mongod process. During mongod global`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`startup, all PrimaryOnlyServices must be registered against the PrimaryOnlyServiceRegistry before`
			`the ReplicationCoordinator is started up (as it is the ReplicationCoordinator startup that starts up`
			`the registered PrimaryOnlyServices). Specific PrimaryOnlyServices can be looked up from the registry`
			`at runtime, and are handed out by raw pointer, which is safe since the set of registered`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`PrimaryOnlyServices does not change during runtime. The PrimaryOnlyServiceRegistry is itself a`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`[ReplicaSetAwareService](../src/mongo/db/repl/README.md#ReplicaSetAwareService-interface), which is`
			`how it receives notifications about changes in and out of Primary state.`

			`### PrimaryOnlyService`

SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`The PrimaryOnlyService interface is used to define a new Primary Only Service. A PrimaryOnlyService`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`is a grouping of tasks (Instances) that run only when the node is Primary and are resumed after`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`failover. Each PrimaryOnlyService must declare a unique, replicated collection (most likely in the`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`admin or config databases), where the state documents for all Instances of the service will be`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`persisted. At stepUp, each PrimaryOnlyService will create and launch Instance objects for each`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`document found in this collection. This is how PrimaryOnlyService tasks get resumed after failover.`

			`### PrimaryOnlyService::Instance/TypedInstance`

			`The PrimaryOnlyService::Instance interface is used to contain the state and core logic for running a`
			`single task belonging to a PrimaryOnlyService. The Instance interface includes a "run()" virtual`
			`method which is provided an executor which is used to run all work that is done on behalf of the`
			`Instance. Implementations should not extend PrimaryOnlyService::Instance directly, instead they`
			`should extend PrimaryOnlyService::TypedInstance, which allows individual Instances to be looked up`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`and returned as pointers to the proper Instance sub-type. The InstanceID for an Instance is the \_id`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`field of its state document.`

			`## Defining a new PrimaryOnlyService`

			`To define a new PrimaryOnlyService one must add corresponding subclasses of both PrimaryOnlyService`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`and PrimaryOnlyService::TypedInstance. The PrimaryOnlyService subclass just exists to specify what`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`collection state documents for this service are stored in, and to hand out corresponding Instances`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`of the proper type. Most of the work of a new PrimaryOnlyService will be implemented in the`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`PrimaryOnlyService::Instance subclass. PrimaryOnlyService::Instance subclasses will be responsible`
			`for running the work they need to perform to complete their task, as well as for managing and`
			`synchronizing their own in-memory and on-disk state. No part of the PrimaryOnlyService machinery`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`ever performs writes to the PrimaryOnlyService state document collections. All writes to a given`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`Instance's state document (including creating it initially and deleting it when the work has been`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`completed) are performed by Instance implementations. This means that for the majority of`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`PrimaryOnlyServices, the first step of its Instance's run() method will be to insert an initial`
			`state document into the state document collection, to ensure that the Instance is now persisted and`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`will be resumed after failover. When an Instance is resumed after failover, it is provided the`
			`current version of the state document as it exists in the state document collection. That document`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`can be used to rebuild the in-memory state for this Instance so that when run() is called it knows`
			`what state it is in and thus what work still needs to be performed, and what work has already been`
			`completed by the previous Primary.`

			`To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the`
			`TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp`

			`## Behavior during state transitions`

			`At stepUp, each PrimaryOnlyService queries its state document collection, and for each document`
			`found, creates and launches a PrimaryOnlyService::Instance initialized off of the state`
			`document. This happens asynchronously relative to the core replication stepUp process - there is no`
			`guarantee that when stepUp completes and the RSTL lock is dropped that the PrimaryOnlyServices have`
			`finished rebuilding all their Instances. At stepDown all Instances are interrupted, but the threads`
			`running their work are not joined, and the Instance objects containing their in-memory state are not`
			`released, until the next stepUp. This is done to reduce the likelihood of blocking within the state`
			`transition process and delaying it for the entire node. This behavior does, however, guarantee that`
			`there will never be two Instances of the same PrimaryOnlyService with the same InstanceID running at`
			`the same time on the same node.`

			`### Interrupting Instances at stepDown`

			`At stepDown, there are 3 main ways that Instances are interrupted and we guarantee that no more work`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`is performed on behalf of any PrimaryOnlyServices. The first is that the executor provided to each`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`Instance's run() method gets shut down, preventing any more work from being scheduled on behalf of`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`that Instance. The second is that all OperationContexts created on threads (Clients) that are part`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`of an Executor owned by a PrimaryOnlyService get interrupted. The third is that each individual`
			`Instance is explicitly interrupted, so that it can unblock any work running on threads that are`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`_not_ a part of an executor owned by the PrimaryOnlyService that are dependent on that Instance`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`signaling them (e.g. commands that are waiting on the Instance to reach a certain state). Currently`
			`this happens via a call to an interrupt() method that each Instance must override, but in the future`
SERVER-53230: Rename cancelation -> cancellation everywhere 2021-03-19 19:28:46 +01:00			`this is likely to change to signaling a CancellationToken owned by the Instance instead.`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00
			`## Instance lifetime`

			`Instances are held by shared_ptr in their parent PrimaryOnlyService. Each PrimaryOnlyService`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`releases all Instance shared_ptrs it owns on stepDown. Additionally, a PrimaryOnlyService will`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`release an Instance shared_ptr when the state document for that Instance is deleted (via an`
SERVER-87034 Initial markdown format (#19276) GitOrigin-RevId: 64e388007ec1ac3744537253540995af628bcc00 2024-02-27 20:47:14 +01:00			`OpObserver). Since generally speaking it is logic from an Instance's run() method that will be`
SERVER-50786 Add architecture guide section on PrimaryOnlyService 2020-09-30 20:48:43 +02:00			`responsible for deleting its state document, such logic needs to be careful as the moment the state`
			`document is deleted, the corresponding PrimaryOnlyService is no longer keeping that Instance alive.`
			`If an Instance has any additional logic or internal state to update after deleting its state`
			`document, it must extend its own lifetime by capturing a shared_ptr to itself by calling`
SERVER-72867 eol-terminate all nonempty text files 2023-04-26 19:50:58 +02:00			`shared_from_this() before deleting its state document.`