A full record of Mysql deadlock troubleshooting process

2021-07-06 11:59:06
OfStack

Preface

The database deadlocks that came into contact before were all caused by the lock order not being 1 when updating in batches, but last week, it encountered a deadlock that was difficult to understand. I took this opportunity to re-learn the deadlock knowledge and common deadlock scenarios of mysql. After many investigations and discussions with colleagues, we finally found the cause of this deadlock problem, and gained a lot. Although we are back-end programmers, we don't need to analyze the source code related to locks in depth like DBA1, but if we can master the basic deadlock troubleshooting methods, it will be of great benefit to our daily development.

PS: This article will not cover the basics of deadlock, but the locking principles of mysql can be found in the Resources section of this article.

Deadlock cause

First, introduce the database and table, because it involves the real data inside the company, so the following are simulated, which will not affect the specific analysis.

We use the 5.5 version of the mysql database, the transaction isolation level is the default RR (Repeatable-Read), using the innodb engine. Assume that an test table exists:


CREATE TABLE `test` (
 `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
 `a` int(11) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;

The structure of the table is simple, with one primary key id and one only index a. The data in the table are as follows:


mysql> select * from test;
+----+------+
| id | a |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
+----+------+
3 rows in set (0.00 sec)

The actions where deadlock occurs are as follows:

步骤	事务1	事务2
1		begin
2		delete from test where a = 2;
3	begin
4	delete from test where a = 2; （事务1卡住）
5	提示出现死锁：ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction	insert into test (id, a) values (10, 2);

Then we can pass SHOW ENGINE INNODB STATUS; To view the deadlock log:


------------------------
LATEST DETECTED DEADLOCK
------------------------
170219 13:31:31
*** (1) TRANSACTION:
TRANSACTION 2A8BD, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 448218, OS thread handle 0x2abe5fb5d700, query id 18923238 renjun.fangcloud.net 121.41.41.92 root updating
delete from test where a = 2
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BD lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;
*** (2) TRANSACTION:
TRANSACTION 2A8BC, ACTIVE 18 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 448217, OS thread handle 0x2abe5fd65700, query id 18923239 renjun.fangcloud.net 121.41.41.92 root update
insert into test (id,a) values (10,2)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BC lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BC lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;
*** WE ROLL BACK TRANSACTION (1)

Analysis

Read the deadlock log

When encountering a deadlock, the first step is to read the deadlock log. The deadlock log is usually divided into two parts, the first part of which describes what lock Transaction 1 is waiting for:


170219 13:31:31
*** (1) TRANSACTION:
TRANSACTION 2A8BD, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 448218, OS thread handle 0x2abe5fb5d700, query id 18923238 renjun.fangcloud.net 121.41.41.92 root updating
delete from test where a = 2
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BD lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;

From the log, we can see that Transaction 1 is currently executing delete from test where a = 2 That statement is applying for an X lock on the index a, so prompt lock_mode X waiting .

Then the lower half of the log describes the locks currently held by Transaction 2 and the locks waiting:


*** (2) TRANSACTION:
TRANSACTION 2A8BC, ACTIVE 18 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 448217, OS thread handle 0x2abe5fd65700, query id 18923239 renjun.fangcloud.net 121.41.41.92 root update
insert into test (id,a) values (10,2)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BC lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 923 n bits 80 index `a` of table `oauthdemo`.`test` trx id 2A8BC lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 00000002; asc ;;
 1: len 4; hex 00000002; asc ;;

From the log HOLDS THE LOCKS(S) Block we can see that transaction 2 holds an X lock on index a and is a record lock (Record Lock). The lock was requested through the delete statement executed by Transaction 2 in Step 2. Because it is an Where a = 2, a record lock is requested instead of an next-key lock because it is an Where index-only equivalent query in RR isolation mode.

From the log WAITING FOR THIS LOCK TO BE GRANTED Block, we can see that transaction 2 is applying for S lock, that is, shared lock. The lock is requested by the insert into test (id, a) values (10, 2) statement. The insert statement normally applies for an exclusive lock, which is the X lock, but here the S lock appears. This is because the a field is a 1-only index, so the insert statement is performed once before insertion duplicate key In order for this check to succeed, you need to apply for an S lock to prevent other transactions from modifying the a field.

So why did the S lock fail? This is a lock application for the same field that needs to be queued. The S lock is preceded by an unsuccessful X lock, so the S lock must wait, so a circular wait is formed and a deadlock appears.

By reading the deadlock log, we can clearly know what kind of circular wait is formed by two transactions, and then analyze it, we can infer the cause of circular wait in reverse, that is, the cause of deadlock formation.

Deadlock formation flow chart

In order to let everyone better understand the causes of deadlock formation, we will explain the process of deadlock formation in the form of tables:

步骤	事务1	事务2
1		begin
2		delete from test where a = 2; 执行成功，事务2占有a=2下的X锁，类型为记录锁。
3	begin
4	delete from test where a = 2; 事务1希望申请a=2下的X锁，但是由于事务2已经申请了1把X锁，两把X锁互斥，所以X锁申请进入锁请求队列。
5	出现死锁，事务1权重较小，所以被选择回滚（成为牺牲品）。	insert into test (id, a) values (10, 2); 由于a字段建立了唯1索引，所以需要申请S锁以便检查duplicate key，由于插入的a的值还是2，所以排在X锁后面。但是前面的X锁的申请只有在事务2commit或者rollback之后才能成功，此时形成了循环等待，死锁产生。

Expand

In the process of checking deadlock, a colleague also found that the above scenario will produce another deadlock, which cannot be reproduced manually, but can only be reproduced in high concurrency scenarios.

The log corresponding to the damn lock is not posted here. The core difference with the last deadlock is that the lock waiting for transaction 2 is changed from S lock to X lock, that is, lock_mode X locks gap before rec insert intention waiting .

Let's explain the flow of damn lock generation in detail through the table:

步骤	事务1	事务2
1		begin
2		delete from test where a = 2; 执行成功，事务2占有a=2下的X锁，类型为记录锁。
3	begin
4		【insert第1阶段】insert into test (id, a) values (10, 2); 事务2申请S锁进行duplicate key进行检查。检查成功。
5	delete from test where a = 2; 事务1希望申请a=2下的X锁，但是由于事务2已经申请了1把X锁，两把X锁互斥，所以X锁申请进入锁请求队列。
6	出现死锁，事务1权重较小，所以被选择回滚（成为牺牲品）。	【insert第2阶段】insert into test (id, a) values (10, 2); 事务2开始插入数据，S锁升级为X锁，类型为insert intention。同理，X锁进入队列排队，形成循环等待，死锁产生。

Summarize

When checking deadlock, we first need to analyze the scene of circular waiting according to the deadlock log, then analyze the locking type and sequence according to the SQL executed by each transaction at present, and infer how to form circular waiting in reverse, so that we can find the cause of deadlock.