A Brief Talk on Semi synchronous Replication of MySQL

2021-07-10 21:00:55
OfStack
 
Brief introduction 
 
MySQL achieves high availability of storage systems through replication (Replication). At present, the replication methods supported by MySQL are: 
 
Asynchronous replication (Asynchronous Replication): The simplest principle and the best performance. However, there is a high probability that the data between master and standby is not 1. 
Semi-synchronous replication (Semi-synchronous Replication): Compared with asynchronous replication, semi-synchronous replication sacrifices the performance of 1, and improves the uniformity of data between master and standby (there are some cases where master and standby data are not uniformity). 
Group replication (Group Replication): Based on Paxos algorithm to achieve strong uniformity of distributed data replication. As long as most machines survive, the system can be guaranteed to be available. Compared with semi-synchronous replication, Group Replication has higher data uniformity and system availability. 

This article mainly discusses MySQL semi-synchronous replication. 
 
Basic flow of semi-synchronous replication 
 
The implementation of MySQL semi-synchronous replication is based on MySQL asynchronous replication. MySQL supports two slightly different types of semi-synchronous replication: AFTER_SYNC and AFTER_COMMIT (controlled by rpl_semi_sync_master_wait_wait_point). 
 
When semi-synchronous replication is turned on, Master waits for a response from Slave or times out before returning. When Slave times out, semi-synchronous replication degenerates into asynchronous replication. This is also a problem with MySQL semi-synchronous replication. This article does not discuss the case of Salve timeout (asynchronous replication is not discussed). 
 
Basic flow of semi-synchronous replication AFTER_SYNC mode 
 
The AFTER_SYNC mode is a semi-synchronous replication mode only supported by MySQL 5.7 and is the default semi-synchronous replication mode for MySQL 5.7: 
 
Prepare the transaction in the storage engine(s). 

Write the transaction to the binlog, flush the binlog to disk. 

Wait for at least one slave to acknowledge the reception for the binlog events for the transaction. 

Commit the transaction to the storage engine(s). 

Basic flow of semi-synchronous replication of AFTER_COMMIT mode 
 
Semi-synchronous replication of MySQL 5.5 and 5.6 only supports AFTER_COMMIT: 
 
Prepare the transaction in the storage engine(s). 

Write the transaction to the binlog, flush the binlog to disk. 

Commit the transaction to the storage engine(s). 

Wait for at least one slave to acknowledge the reception for the binlog events for the transaction. 

Summary of AFTER_SYNC and AFTER_COMMIT 
 
AFTER_SYNC: After the log is copied to Slave, Master is followed by commit. 

All commit transactions on master have been replicated to slave. 

All transactions that have been copied to slave are not set to commit at master (for example, master is down after copying logs to slave and before commit) 

AFTER_COMMIT: Master commit and then copy the log to Slave. 

All transactions of commit on master are not necessarily replicated to slave. (For example, after master commit, the log goes down before it can be copied to slave.) 

All transactions that have been copied to slave are determined to commit on master. 

Obviously, AFTER_COMMIT can't guarantee the uniformity of data when master goes down (master goes down before copying logs to slave after commit). Only the AFTER_SYNC schema will be discussed next in this article. 

MySQL5.7. 3 begins to support configuring semi-synchronous replication to wait for Slave acknowledgements: rpl_semi_sync_master_wait_slave_count. 

Analysis of Anomalies in AFTER_SYNC Mode 
 
Abnormal situation: After master goes down, the main and standby switches. 
 
The master executes the transaction T, and the master goes down before the binlog of the transaction T is swiped onto the hard disk. slave is upgraded to master. After master restarts, crash recovery rolls back the transaction T. Main and standby data 1 to. 
 
master executes transaction T, and master goes down (pendinglog exists) after binlog of transaction T is swiped onto the hard disk and before ACK of slave is received. slave is upgraded to master. 
 
2.1 slave has not received binlog for transaction T. After master is restarted, crash recovery will commit pendinglog directly. The main and standby data are not 1. 
 
2.2 slave has received binlog for transaction T. Main and standby data 1 to. 
 
Exception 2: After master goes down, the host is not switched. Just consider 2.1 in Exception 1. 
 
After master is restarted, submit pendinglog directly. At this time, the main and standby data are not 1, resulting in: 
 
slave is connected to master, and binlog of transaction T is obtained by asynchronous replication. Main and standby data 1 to. 

slave has not had time to replicate binlog of transaction T, and if master goes down again, the disk is damaged. If the primary and standby data are not 1, the data of transaction T will be lost. 

Exception handling 
 
From the simple analysis of the above abnormal situation, we know that semi-synchronous replication needs to deal with the special situation that pendinglog (binlog without reply from slave) is restarted after master goes down. 
 
In view of the situation that the main and standby switching is not carried out after master goes down: 

After crash recovery, master waits until slave is connected and replicated until at least one slave replicates the binlog of all committed transactions. (SHOW MASTER STATUS on master and SELECT master_pos_wait () on slave). 
 
In view of the situation that the main and standby switches after master goes down: 

After the old master is restarted, when crash recovery, the pendinglog is rolled back. (Manually truncate the uncopied part of binlog of master?) 
 
Thinking 
 
Why is commit pendinglog directly during crash recovery after master is restarted, instead of retrying the reply to request slave? 
 
Asynchronous replication and semi-synchronous replication of MySQL are triggered by slave, and slave actively disconnects master and synchronizes binlog. 
 
There is no master-standby switch, and it is impossible to know which machine is slave after the machine restarts. 

If a master-standby switch occurs and it is no longer an master, no more slave will be connected. If you continue to wait, it will not work properly. 

Summarize 
 
MySQL semi-synchronous replication has the following issues: 
 
When Slave times out, it degenerates into asynchronous replication. 
When Master goes down, data 1 can't be guaranteed and needs manual processing. 
Replication is serial. 
 
It is precisely because MySQL has these problems in the main and standby data, which affects the high availability service of Internet business 7*24, so major companies have offered their own "patches": Tencent's TDSQL, WeChat's PhxSQL, Ali's AliSQL and Netease's InnoSQL. 
 
MySQL has officially launched a new replication mode-MySQL Group Replication in MySQL 5.7. 
 
References 
 
Discussion on data uniformity of MySQL semi-synchronous replication 
 
MySQL High Availability Solutions 
 
Loss-less Semi-Synchronous Replication on MySQL 5.7.2 
 
Enhanced semisync replication