Introduction
MongoDB is a highly scalable NoSQL database manager designed for working with big data. It is document-oriented and uses JSON-like documents for storing key-value pairs. This NoSQL solution features optional schemas and uses the BASE transaction model.
Sharding is a MongoDB feature that enables easy scaling. As a method for data distribution across multiple database instances, sharding allows MongoDB to support high throughput operations on large data sets.
This article will show you how to deploy a sharded MongoDB cluster using Docker and Docker Compose.
Prerequisites
- Docker installed.
- Docker Compose installed.
- MongoDB Client Application.
What is Sharding?
Sharding is the process in MongoDB that enables horizontal scaling of a database. With sharding, the system divides the data into subsets called shards. Shards are often deployed as replica sets stored on multiple machines for high availability.
Each sharded cluster in MongoDB consists of the following components:
- Config servers store the cluster configuration and metadata. One of the servers acts as the primary server, and others act as secondary.
- Shards contain data subsets. One shard is primary, while others are secondary.
- Query router enables client applications to interact with the cluster.
The diagram below illustrates the architecture of a sharded MongoDB cluster.
How to Deploy a Sharded Cluster in MongoDB
To deploy a fully functional MongoDB sharded cluster, deploy each cluster element separately. Below are the steps for sharded cluster deployment using Docker containers and Docker Compose.
Note: The tutorial uses a single test machine to deploy all cluster elements. While it is possible to implement sharding in this way, MongoDB recommends using a separate machine for each member of each deployed replica set in a production environment.
Step 1: Deploy a Config Server Replica Set
Start by deploying a replica set of config servers for storing configuration settings and cluster metadata.
1. Create a directory and navigate to it.
mkdir config && cd config
2. Use a text editor to create a docker-compose.yaml file.
nano docker-compose.yaml
3. Write the configuration you want to deploy. The example below defines three config server replicas in the services
section and three Docker volumes for persistent data storage in the volumes
section.
version: '3'
services:
configs1:
container_name: configs1
image: mongo
command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
ports:
- 10001:27017
volumes:
- configs1:/data/db
configs2:
container_name: configs2
image: mongo
command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
ports:
- 10002:27017
volumes:
- configs2:/data/db
configs3:
container_name: configs3
image: mongo
command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
ports:
- 10003:27017
volumes:
- configs3:/data/db
volumes:
configs1: {}
configs2: {}
configs3: {}
Each config server replica requires the following parameters:
- Name. Choose any name you want. Numbered names are recommended for easier instance management.
- Name of the Docker container. Choose any name.
- Docker image. Use the
mongo
image available on Docker Hub. mongod
command. The command specifies the instance is a config server (--configsvr
) and part of a replica set (--replSet
). Furthermore, it defines the default port (27017
) and the path to the database.- Ports. Map the default Docker port to an external port of your choosing.
- Volumes. Define the database path on a permanent storage volume.
Save and exit the file when you finish.
4. Apply the configuration with the docker-compose
command:
docker-compose -f [path-to-file]/docker-compose.yaml up -d
The system confirms the successful deployment of the MongoDB config servers.
5. Check the running containers in Docker.
docker ps
All three config server replicas show as separate containers with different external ports.
Alternatively, you can use the docker-compose
command to list only the containers relevant to the deployment:
docker-compose -f config/docker-compose.yaml ps
6. Check the Docker volumes:
docker volume ls
7. Use the Mongo client application to log in to one of the config server replicas:
mongo mongodb://[ip-address]:[port]
As a result, the MongoDB shell command prompt appears:
8. Initiate the replicas in MongoDB by using the rs.initiate()
method. The configsvr
field set to true
is required for config server initiation.
rs.initiate(
{
_id: "cfgrs",
configsvr: true,
members: [
{ _id : 0, host : "[ip-address]:[port]" },
{ _id : 1, host : "[ip-address]:[port]" },
{ _id : 2, host : "[ip-address]:[port]" }
]
}
)
If the operation is successful, the "ok"
value in the output is 1
. Conversely, if an error occurs, the value is 0
, and an error message is displayed.
Press Enter to exit the secondary and return to the primary instance.
9. Use the rs.status()
method to check the status of your instances.
rs.status()
Step 2: Create Shard Replica Sets
After setting up a config server replica set, create shards that will contain your data. The example below shows how to create and initiate only one shard replica set, but the process for each subsequent shard is the same.
1. Create and navigate to the directory where you will store shard-related manifests.
mkdir shard && cd shard
2. Create a docker-compose.yaml file with a text editor.
nano docker-compose.yaml
3. Configure shard instances. Below is an example of a docker-compose.yaml that defines three shard replica sets and three permanent storage volumes.
version: '3'
services:
shard1s1:
container_name: shard1s1
image: mongo
command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
ports:
- 20001:27017
volumes:
- shard1s1:/data/db
shard1s2:
container_name: shard1s2
image: mongo
command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
ports:
- 50002:27017
volumes:
- shard1s2:/data/db
shard1s3:
container_name: shard1s3
image: mongo
command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
ports:
- 50003:27017
volumes:
- shard1s3:/data/db
volumes:
shard1s1: {}
shard1s2: {}
shard1s3: {}
The YAML file for the shard replica set contains specifications similar to the config server specifications. The main difference is in the command
field for each replica, where the mongod
command for shards is issued with the --shardsvr
option. As a result, MongoDB recognizes the servers as shard instances.
When you finish, save and exit the file.
4. Use Docker Compose to apply the replica set configuration.
docker-compose -f shard/docker-compose.yaml up -d
The output confirms the successful creation of the Docker containers.
5. Log in to one of the replicas using the mongo
command.
mongo mongodb://10.0.2.15:20001
6. Initiate the replica set with rs.initiate()
.
rs.initiate(
{
_id: "shard1rs",
members: [
{ _id : 0, host : "10.0.2.15:20001" },
{ _id : 1, host : "10.0.2.15:20002" },
{ _id : 2, host : "10.0.2.15:20003" }
]
}
)
If the initiation is successful, the output value of "ok"
is 1
.
Note: Replica set names must be unique for each shard replica set you add to the cluster.
Step 3: Start a mongos Instance
A mongos instance acts as a query router, i.e., an interface between the cluster and client apps. Follow the steps below to set it up in your cluster.
1. Create a directory for your mongos configuration and navigate to it.
mkdir mongos && cd mongos
2. Create a Docker Compose file.
nano docker-compose.yaml
3. Configure the mongos instance. For example, the file below creates a mongos instance and exposes it to port 30000
. The command section should contain the --configdb
option, followed by references to the addresses of config server replicas.
version: '3'
services:
mongos:
container_name: mongos
image: mongo
command: mongos --configdb cfgrs/10.0.2.15:10001,10.0.2.15:10002,10.0.2.15:10003 --bind_ip 0.0.0.0 --port 27017
ports:
- 30000:27017
Save the file and exit.
4. Next, apply the configuration with docker-compose
:
docker-compose -f mongos/docker-compose.yaml up -d
The output shows Docker has created the mongos instance container.
Check the running containers in Docker:
docker ps
After deploying three config server replicas, three shard replicas, and one mongos instance, the output shows seven containers based on the mongo image.
Step 4: Connect to the Sharded Cluster
With all the instances up and running, the rest of the cluster configuration takes place inside the cluster. Connect to the cluster using the mongo
command:
mongo mongodb://[mongos-ip-address]:[mongos-port]
The MongoDB shell command prompt appears.
Step 5: Add Shards to the Cluster
Use the sh.addshard()
method and connect the shard replicas to the cluster:
sh.addShard("[shard-replica-set-name]/[shard-replica-1-ip]:[port],[shard-replica-2-ip]:[port],[shard-replica-3-ip]:[port]")
The output shows that the system successfully added the shards to the cluster. The "ok"
value is 1
:
Check the status with the sh.status()
method:
sh.status()
The output lists the active shards in the shards
section:
Step 6: Enable Sharding for a Database
Enable sharding for each database you plan to use it on. Use the sh.enableSharding()
method, followed by the database name.
sh.enableSharding("[database-name")
The example below enables sharding for the database named testdb
:
Note: Database workloads require high memory density and large storage capacity to perform well. Our BMC database servers are workload-optimized and support all major databases.
Step 7: Shard a Collection
There are two ways to shard a collection in MongoDB. Both methods use the sh.shardCollection()
method:
- Range-based sharding produces a shard key using multiple fields and creates contiguous data ranges based on the shard key values.
- Hashed sharding forms a shard key using a single field's hashed index.
To shard a collection using range-based sharding, specify the field to use as a shard key, and set its value to 1
:
sh.shardCollection("[database].[collection]", { [field]: 1 } )
Use the same syntax to set up hashed sharding. This time set the value of the field to "hashed"
:
sh.shardCollection("[database].[collection]", { [field]: "hashed" } )
Note: Once you shard a collection, you cannot unshard it.
Conclusion
After reading this article, you will know how to deploy a sharded MongoDB cluster using Docker and Docker Compose. If you are interested in how MongoDB compares against popular database management solutions, read MongoDB vs. MySQL and Cassandra vs. MongoDB.
Additionally, for an overview of the best open-source DBMSs, read 8 Best Open-Source Databases.