0
0
mirror of https://github.com/mongodb/mongo.git synced 2024-12-01 09:32:32 +01:00

SERVER-47244 Add the Auto-splitting section to the sharding arch guide

This commit is contained in:
Haley Connelly 2020-06-15 23:01:06 +00:00 committed by Evergreen Agent
parent 128ea14211
commit 7419262b63

View File

@ -314,6 +314,88 @@ The config server replica set durably stores settings for the maximum chunk size
should be automatically split and balanced.
## Auto-splitting
When the mongos routes an update or insert to a chunk, the chunk may grow beyond the configured
chunk size (specified by the server parameter maxChunkSizeBytes) and trigger an auto-split, which
partitions the oversized chunk into smaller chunks. The shard that houses the chunk is responsible
for:
* determining if the chunk should be auto-split
* selecting the split points
* committing the split points to the config server
* refreshing the routing table cache
* updating in memory chunk size estimates
### Deciding when to auto-split a chunk
The server schedules an auto-split if:
1. it detected that the chunk exceeds a threshold based on the maximum chunk size
2. there is not already a split in progress for the chunk
Every time an update or insert gets routed to a chunk, the server tracks the bytes written to the
chunk in memory through the collection's ChunkManager. The ChunkManager has a ChunkInfo object for
each of the collection's entries in the local config.chunks. Through the ChunkManager, the server
retrieves the chunk's ChunkInfo and uses its ChunkWritesTracker to increment the estimated chunk
size.
Even if the new size estimate exceeds the maximum chunk size, the server still needs to check that
there is not already a split in progress for the chunk. If the ChunkWritesTracker is locked, there
is already a split in progress on the chunk and trying to start another split is prohibited.
Otherwise, if the chunk is oversized and there is no split for the chunk in progress, the server
submits the chunk to the ChunkSplitter to be auto-split.
### The auto split task
The ChunkSplitter is a replica set primary-only service that manages the process of auto-splitting
chunks. The ChunkSplitter runs auto-split tasks asynchronously - thus, multiple chunks can undergo
an auto-split concurrently, provided they don't overlap.
To prepare for the split point selection process, the ChunkSplitter flags that an auto-split for the
chunk is in progress. There may be incoming writes to the original chunk while the split is in
progress. For this reason, the estimated data size in the ChunkWritesTracker for this chunk is
reset, and the same counter is used to track the number of bytes written to the chunk while the
auto-split is in progress.
splitVector manages the bulk of the split point selection logic. First, the data size and number of
records are retrieved from the storage engine to approximate the number of keys that each chunk
partition should have. This number is calculated such that if each document were uniform in size,
each chunk would be half of maxChunkSizeBytes.
If the actual data size is less than the maximum chunk size, no splits are made at all.
Additionally, if all documents in the chunk have the same shard key, no splits can be made. In this
case, the chunk may be classified as a jumbo chunk.
In the general case, splitVector:
* performs an index scan on the shard key index
* adds every k'th key to the vector of split points, where k is the approximate number of keys each chunk should have
* returns the split points
If no split points were returned, then the auto-split task gets abandoned and the task is done.
If split points are successfully generated, the ChunkSplitter executes the final steps of the
auto-split task where the shard:
* commits the split points to config.chunks on the config server by removing the document containing
the original chunk and inserting new documents corresponding to the new chunks indicated by the
split points
* refreshes the routing table cache
* replaces the original oversized chunk's ChunkInfo with a ChunkInfo object for each partition. The
estimated data size for each new chunk is the number bytes written to the original chunk while the
auto-split was in progress
### Top Chunk Optimization
While there are several optimizations in the auto-split process that won't be covered here, it's
worthwhile to note the concept of top chunk optimization. If the chunk being split is the first or
last one on the collection, there is an assumption that the chunk is likely to see more insertions
if the user is inserting in ascending/descending order with respect to the shard key. So, in top
chunk optimization, the first (or last) key in the chunk is set as a split point. Once the split
points get committed to the config server, and the shard refreshes its CatalogCache, the
ChunkSplitter tries to move the top chunk out of the shard to prevent the hot spot from sitting on a
single shard.
#### Code references
* [**ChunkInfo**](https://github.com/mongodb/mongo/blob/18f88ce0680ab946760b599437977ffd60c49678/src/mongo/s/chunk.h#L44) class
* [**ChunkManager**](https://github.com/mongodb/mongo/blob/master/src/mongo/s/chunk_manager.h) class
* [**ChunkSplitter**](https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/chunk_splitter.h) class
* [**ChunkWritesTracker**](https://github.com/mongodb/mongo/blob/master/src/mongo/s/chunk_writes_tracker.h) class
* [**splitVector**](https://github.com/mongodb/mongo/blob/18f88ce0680ab946760b599437977ffd60c49678/src/mongo/db/s/split_vector.cpp#L61) method
* [**splitChunk**](https://github.com/mongodb/mongo/blob/18f88ce0680ab946760b599437977ffd60c49678/src/mongo/db/s/split_chunk.cpp#L128) method
* [**commitSplitChunk**](https://github.com/mongodb/mongo/blob/18f88ce0680ab946760b599437977ffd60c49678/src/mongo/db/s/config/sharding_catalog_manager_chunk_operations.cpp#L316) method where chunk splits are committed
## Auto-balancing