MongoDB Sharding
Sharding
In MongoDB, there is another type of cluster called sharding, which can meet the needs of massive data growth in MongoDB.
When MongoDB stores a vast amount of data, a single machine may not be sufficient to store the data or provide acceptable read and write throughput. In such cases, we can split the data across multiple machines to enable the database system to store and process more data.
Why Use Sharding
- Replicates all write operations to the primary node.
- Sensitive data with latency requirements is queried on the primary node.
- A single replica set is limited to 12 nodes.
- Insufficient memory when request volumes are high.
- Insufficient local disk space.
- Vertical scaling is expensive.
MongoDB Sharding
The following diagram illustrates the sharded cluster structure in MongoDB:
The diagram primarily includes the following components:
Shard: Stores actual data blocks. In a production environment, a shard server role can be handled by a replica set of multiple machines to prevent single-point failures.
Config Server: A mongod instance that stores the entire cluster metadata, including chunk information.
Query Routers: Front-end routers that clients connect to, making the entire cluster appear as a single database. Front-end applications can use it transparently.
Sharding Example
The sharding structure port distribution is as follows:
Shard Server 1: 27020
Shard Server 2: 27021
Shard Server 3: 27022
Shard Server 4: 27023
Config Server: 27100
Route Process: 40000
Step 1: Start Shard Server
[root@100 /]# mkdir -p /www/mongoDB/shard/s0
[root@100 /]# mkdir -p /www/mongoDB/shard/s1
[root@100 /]# mkdir -p /www/mongoDB/shard/s2
[root@100 /]# mkdir -p /www/mongoDB/shard/s3
[root@100 /]# mkdir -p /www/mongoDB/shard/log
[root@100 /]# /usr/local/mongoDB/bin/mongod --port 27020 --dbpath=/www/mongoDB/shard/s0 --logpath=/www/mongoDB/shard/log/s0.log --logappend --fork
....
[root@100 /]# /usr/local/mongoDB/bin/mongod --port 27023 --dbpath=/www/mongoDB/shard/s3 --logpath=/www/mongoDB/shard/log/s3.log --logappend --fork
Step 2: Start Config Server
[root@100 /]# mkdir -p /www/mongoDB/shard/config
[root@100 /]# /usr/local/mongoDB/bin/mongod --port 27100 --dbpath=/www/mongoDB/shard/config --logpath=/www/mongoDB/shard/log/config.log --logappend --fork
Note: We can start the service just like a regular MongoDB service without needing to add the --shardsvr and --configsvr parameters. These parameters only change the startup port, so we specify the port ourselves.
Step 3: Start Route Process
/usr/local/mongoDB/bin/mongos --port 40000 --configdb localhost:27100 --fork --logpath=/www/mongoDB/shard/log/route.log --chunkSize 500
In the mongos startup parameters, chunkSize specifies the size of the chunk in MB, with a default size of 200MB.
Step 4: Configure Sharding
Next, we log into mongos using the MongoDB Shell and add the Shard nodes.
[root@100 shard]# /usr/local/mongoDB/bin/mongo admin --port 40000
MongoDB shell version: 2.0.7
connecting to: 127.0.0.1:40000/admin
mongos> db.runCommand({ addshard:"localhost:27020" })
{ "shardAdded" : "shard0000", "ok" : 1 }
......
mongos> db.runCommand({ addshard:"localhost:27029" })
{ "shardAdded" : "shard0009", "ok" : 1 }
mongos> db.runCommand({ enablesharding:"test" }) # Set the sharded storage database
{ "ok" : 1 }
mongos> db.runCommand({ shardcollection: "test.log", key: { id:1,time:1}})
{ "collectionsharded" : "test.log", "ok" : 1 }
Step Five: No major changes are needed in the program code; simply connect to the database as you would with a regular MongoDB database, using port 40000.