Detail how MongoDB USES sharding to assign replica sets to server clusters

  • 2020-05-27 07:29:07
  • OfStack

About replica sets

A replica set is a process in which data is synchronized between multiple machines.
The replica collective provides data redundancy and extends data availability. Saving data on multiple servers can avoid data loss caused by one server.
You can also free yourself from hardware failures or service outages, take advantage of additional copies of data, and work on disaster recovery or backup from one machine.

In some scenarios, replica sets can be used to extend read performance. The client has the ability to send read and write operations to different servers.
It is also possible to obtain different copies in different data centers to extend the ability of distributed applications.

The mongodb replica set is a set of mongodb instances that have the same data, master mongodb accepts all writes, and all other instances can accept operations by the master instance to keep the data in sync.
The primary instance accepts customer writable operations, and the replica set can have only one primary instance, since only one instance can be written in order to maintain data 1 uniqueness, and the log of the primary instance is kept in oplog.


Client Application Driver
  Writes  Reads
    |   |
    Primary
  |Replication|Replication
Secondary    Secondary

The level 2 node copies the oplog of the master node and then performs operations on its own copy of the data. The level 2 node is a reflection of the data of the master node. If the master node is not available, a new master node will be elected. The default read operation is performed on the primary node, but you can specify the read preference parameter to specify the read operation to the replica node.
You can add an additional quorum node (which does not have the right to stand) to keep the replica set node odd, ensuring that direct points with different votes can be elected. Arbitrators do not require dedicated hardware.
The arbitrator node 1 will hold the arbitrator identity directly.

1. Asynchronous replication
The replica node synchronous direct point operation is asynchronous, however, causing the replica set to fail to return the latest data to the client program.

2. Automatic failover
If the primary node above 10s loses communication with other nodes, the other nodes will elect the new node as the primary node.
The secondary node with the most votes is elected as the primary node.

Replica sets provide some options for applications to make one replica set of members located in a different data center.
You can also specify members with different priorities to control elections.

sharding converts one replica set to a sharded cluster
1. Deploy a test replica set
Create the first instance of the replica set with the name firstset:
1.1 create the replica set and insert the data as follows:


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3

Create directory:


mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3

1.2 start three instances of mongodb at other terminals, as follows:


mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest --fork --logpath /data/example/firstset1/firstset1.log --logappend --nojournal --directoryperdb
mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest --fork --logpath /data/example/firstset2/firstset2.log --logappend --nojournal --directoryperdb
mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest --fork --logpath /data/example/firstset3/firstset3.log --logappend --nojournal --directoryperdb

-- the oplog option forces the operation log of each mongodb instance to be 700M. Without this parameter, the default is 5% of the partition space. Limiting the size of oplog can make each instance start up 1 point faster.
1.3 connect the shell of one mongodb instance


mongo mongo01:10001/admin

If you are running in a production environment, or on a machine with a different host name or IP, you need to change mongo01 to the specified name.
1.4 initialize the replica set on mongo shell


var config = {
  "_id" : "firstset",
  "members" : [
    {"_id" : 0, "host" : "mongo01:10001"},
    {"_id" : 1, "host" : "mongo01:10002"},
    {"_id" : 2, "host" : "mongo01:10003"},
  ]
}
rs.initiate(config);
{
    "info" : "Config now saved locally. Should come online in about a minute.",
    "ok" : 1
}

or


db.runCommand(
  {"replSetInitiate" :
    {"_id" : "firstset",
    "members" : [
      {"_id" : 0, "host" : "mongo01:10001"},
      {"_id" : 1, "host" : "mongo01:10002"},
      {"_id" : 2, "host" : "mongo01:10003"}
      ]
    }
  }
)

1.5 create and insert data in mongo shell:


use mydb
switched to db mydb
animal = ["dog", "tiger", "cat", "lion", "elephant", "bird", "horse", "pig", "rabbit", "cow", "dragon", "snake"];
for(var i=0; i<100000; i++){
  name = animal[Math.floor(Math.random()*animal.length)];
  user_id = i;
  boolean = [true, false][Math.floor(Math.random()*2)];
  added_at = new Date();
  number = Math.floor(Math.random()*10001);
  db.test_collection.save({"name":name, "user_id":user_id, "boolean": boolean, "added_at":added_at, "number":number });
}

The above operation inserts a million pieces of data into the collection test_collection, which can take several minutes, depending on the system.
The script adds documents in the following format:

2. Deploy one sharding facility
Create three configuration servers to hold the metadata for the cluster.
For a development or test environment, one configuration server is sufficient, and for production, it takes three days to configure the servers, because they require very little resources to hold the metadata.
2.1 create the data file save directory of the configuration server:


  /data/example/config1
  /data/example/config2
  /data/example/config3

Create directory:


mkdir -p /data/example/config1 /data/example/config2 /data/example/config3

2.2 start the configuration server on another terminal:


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
0

2.3 launch the mongos instance under another terminal:
mongos --configdb mongo01:20001,mongo01:20002,mongo01:20003 --port 27017 --chunkSize 1 --fork --logpath /data/example/mongos.log --logappend
If you are using a previously created table or test environment, you can use the minimum chunksize(1M). The default chunksize is 64M, which means the cluster must have the data files of 64MB before the mongodb auto-sharding starts.
You can't use a very small slice size in a production environment.
The configdb option specifies the configuration server. The mongos instance runs on the default mongodb27017 port.
2.4 you can add the first shard to mongos and execute the following command on the new terminal:
2.4.1 connect the mongos instance


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
1

2.4.2 add the first sharding using the addShard command


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
2

2.4.3 the following information appears, indicating success:


{ "shardAdded" : "firstset", "ok" : 1 }

3. Deploy another test replica set
Create another instance of the replica set, secondset:
3.1 create a replica set and insert data as follows:


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
4

Create directory:


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
5

3.2 start three instances of mongodb on other terminals, as follows:


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
6

3.3 connect shell of one mongodb instance


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
7

3.4 initialize the replica set on mongo shell


db.runCommand(
  {"replSetInitiate" :
    {"_id" : "secondset",
    "members" : [
      {"_id" : 0, "host" : "mongo01:30001"},
      {"_id" : 1, "host" : "mongo01:30002"},
      {"_id" : 2, "host" : "mongo01:30003"}
      ]
    }
  }
)

3.5 add the replica set to the sharding cluster


  /data/example/firstset1
  /data/example/firstset2
  /data/example/firstset3
9

Return success information:


{ "shardAdded" : "firstset", "ok" : 1 }

3.6 verify that shards were added successfully by running the listShards command. As follows:


db.runCommand({listShards:1})
{
  "shards" : [
    {
      "_id" : "firstset",
      "host" : "firstset/mongo01:10001,mongo01:10002,mongo01:10003"
    },
    {
      "_id" : "secondset",
      "host" : "secondset/mongo01:30001,mongo01:30002,mongo01:30003"
    }
  ],
  "ok" : 1
}


Related articles: