Summary of using traps of primary key and rowid in MySQL

  • 2021-12-13 17:25:19
  • OfStack

Preface

We may have heard the concept of rowid in MySQL, but it is difficult to test the practice, and there will inevitably be some doubts, such as:

How to feel the existence of rowid; What is the relationship between rowid and primary key; What hidden dangers exist in the use of primary keys; How to understand the potential bottlenecks of rowid and debug verification.

This article will discuss these issues with you in a test environment based on MySQL 5.7. 19.

Question 1. How to feel the existence of rowid

We might as well illustrate it with a case.

I remember one day when I counted the backup data, I wrote an SQL. When I saw the execution result, I found that the SQL statement was not completely written. After completing the statistical work, I was ready to analyze this SQL statement.


mysql> select backup_date ,count(*) piece_no from redis_backup_result;

+-------------+----------+

| backup_date | piece_no |

+-------------+----------+

| 2018-08-14 | 40906 |

+-------------+----------+

1 row in set (0.03 sec)

According to the business characteristics, there are definitely not so many records within one day, which is obviously wrong. What went wrong?

I carefully looked at SQL and found that group by was not added. We randomly found 10 pieces of data.


mysql> select backup_date from redis_backup_result limit 10;

+-------------+

| backup_date |

+-------------+

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

+-------------+

10 rows in set (0.00 sec)

In earlier versions, the database parameter sql_mode defaults to null, which does not check this part and is allowed from a syntax point of view; However, it is not supported after the higher version, such as version 5.7, so the solution is very simple. After adding group by, the results are as expected.


mysql> select backup_date ,count(*) piece_no from redis_backup_result group by backup_date;

+-------------+----------+

| backup_date | piece_no |

+-------------+----------+

| 2018-08-14 | 3 |

| 2018-08-15 | 121 |

| 2018-08-16 | 184 |

| 2018-08-17 | 3284 |

| 2018-08-18 | 7272 |

| 2018-08-19 | 7272 |

| 2018-08-20 | 7272 |

| 2018-08-21 | 7272 |

| 2018-08-22 | 8226 |

+-------------+----------+

9 rows in set (0.06 sec)

But curious about this parsing logic, it seems that SQL parses line 1 and then outputs the operation of count (*), which is obviously not available from the execution plan.

If we change our thinking, we can see that this table has more than 40,000 records.


mysql> select count(*)from redis_backup_result;

+----------+

| count(*) |

+----------+

| 40944 |

+----------+

1 row in set (0.01 sec)

For verification, we can use _ rowid for preliminary verification.

The InnoDB table generates a 6-byte space auto-growing primary key without a default primary key, which can be queried using select _ rowid from table, as follows:


mysql> select _rowid from redis_backup_result limit 5;

+--------+

| _rowid |

+--------+

| 117 |

| 118 |

| 119 |

| 120 |

| 121 |

+--------+

5 rows in set (0.00 sec)

Then a preliminary idea can be realized.


mysql> select _rowid,count(*)from redis_backup_result;

+--------+----------+

| _rowid | count(*) |

+--------+----------+

| 117 | 41036 |

+--------+----------+

1 row in set (0.03 sec)

Then continue to sublimate 1 and realize it with the help of rownum. Of course, this feature is not supported natively in MySQL and needs to be realized indirectly.


mysql> SELECT @rowno:=@rowno+1 as rowno,r._rowid from redis_backup_result

r ,(select @rowno:=0) t limit 20;

+-------+--------+

| rowno | _rowid |

+-------+--------+

| 1 | 117 |

| 2 | 118 |

| 3 | 119 |

| 4 | 120 |

| 5 | 121 |

| 6 | 122 |

| 7 | 123 |

| 8 | 124 |

| 9 | 125 |

| 10 | 126 |

| 11 | 127 |

| 12 | 128 |

| 13 | 129 |

| 14 | 130 |

| 15 | 131 |

| 16 | 132 |

| 17 | 133 |

| 18 | 134 |

| 19 | 135 |

| 20 | 136 |

+-------+--------+

20 rows in set (0.00 sec)

Write a complete statement, as follows:


mysql> SELECT @rowno:=@rowno+1 as rowno,r._rowid ,backup_date,count(*)

from redis_backup_result r ,(select @rowno:=0) t ;

+-------+--------+-------------+----------+

| rowno | _rowid | backup_date | count(*) |

+-------+--------+-------------+----------+

| 1 | 117 | 2018-08-14 | 41061 |

+-------+--------+-------------+----------+

1 row in set (0.02 sec)

From this case, it is obvious that it is the record in line 1, and then the operation of count (*) is done.

Of course, our goal is to master the relationship between rowid and primary key, so we also repeat the hidden problems in the use of primary key.

Question 2. What is the relationship between rowid and primary key

When studying the index specification of MySQL development specification, I emphasized one key point: every table is recommended to have a primary key. Let's briefly analyze why here.

In addition to the specification, in terms of storage mode, in the InnoDB storage engine, tables are stored in the order of primary keys, which is called clustered index table or index organization table (IOT). The reference basis of primary keys in tables is as follows:

Explicitly create the primary key Primary key. Determines whether there is a non-empty only 1 index in the table, and if so, it is the primary key. If none of the above conditions are met, a 6-byte bigint unsigned value is generated.

It can be seen from the above that MySQL has a set of maintenance mechanism for primary keys, and some common indexes will also have corresponding impacts, such as uniqueness index, non-uniqueness index and overlay index, which are all auxiliary indexes (secondary index, also called level 2 index). From the storage point of view, level 2 index columns contain primary key columns by default. If the primary key is too long, level 2 indexes will also take up a lot of space.

Question 3. What are the hidden dangers in the use of primary keys

This leads to a very common primary key performance problem in the industry, which is not a single 1 problem, and needs to be continuously reformed in the direction of MySQL, combining technical value with business value. I see that many businesses have set up self-adding columns, But most of the time, This self-addition has no practical business meaning, Although the primary key column ensures the uniqueness of ID, business development cannot query directly according to the primary key self-addition column, so they need to find new business attributes, add 1 series of uniqueness indexes, non-uniqueness indexes, etc., so there is a deviation between the specifications we insist on and the way we use the business.

From another dimension, our understanding of the primary key is biased. We can't think that the primary key is an integer type starting from 1. We need to look at it in combination with business scenarios. For example, our ID card is actually a good example. The ID number is divided into several sections, which is biased towards retrieval and maintenance; Or the running water number obtained when eating out, it has a fixed business attribute in it, which is a good reference for us to understand the use of business.

Question 4. How to understand the potential bottleneck of rowid and debug and verify it

We know that rowid is only 6 bytes, so the maximum value is 2 48, so if row_id exceeds this value, it will still increase. Is there a hidden danger in this case?

We can do a test to illustrate it.

1) We create a table test_inc that does not contain any indexes.


create table test_inc(id int) engine=innodb;

2) Get the corresponding process number through ps-efgrep mysql, and use gdb to start debugging configuration, remember! This should be your own test environment.


[root@dev01 mysql]# gdb -p 3132 -ex 'p dict_sys->row_id=1' -batch

[New LWP 3192]

[New LWP 3160]

[New LWP 3159]

[New LWP 3158]

[New LWP 3157]

[New LWP 3156]

[New LWP 3155]

[New LWP 3154]

[New LWP 3153]

[New LWP 3152]

[New LWP 3151]

[New LWP 3150]

[New LWP 3149]

[New LWP 3148]

[New LWP 3147]

[New LWP 3144]

[New LWP 3143]

[New LWP 3142]

[New LWP 3141]

[New LWP 3140]

[New LWP 3139]

[New LWP 3138]

[New LWP 3137]

[New LWP 3136]

[New LWP 3135]

[New LWP 3134]

[New LWP 3133]

[Thread debugging using libthread_db enabled]

0x00000031ed8df283 in poll () from /lib64/libc.so.6

$1 = 1

3) We do a basic test and get a table building statement to ensure that the test is what it is expected.


mysql> select backup_date from redis_backup_result limit 10;

+-------------+

| backup_date |

+-------------+

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

+-------------+

10 rows in set (0.00 sec)
0

4) Insert some data to make rowid increase continuously.


mysql> select backup_date from redis_backup_result limit 10;

+-------------+

| backup_date |

+-------------+

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

+-------------+

10 rows in set (0.00 sec)
1

5) We reset rowid to 2 ^ 48


mysql> select backup_date from redis_backup_result limit 10;

+-------------+

| backup_date |

+-------------+

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

+-------------+

10 rows in set (0.00 sec)
2

6) Continue to write some data, for example, we write 4, 5, 63 lines of data.


mysql> insert into test_inc values(4),(5),(6); 

Query OK, 3 rows affected (0.07 sec)

Records: 3 Duplicates: 0 Warnings: 0

7) Look at the data results and find that lines 1 and 2 have been overwritten.


mysql> select backup_date from redis_backup_result limit 10;

+-------------+

| backup_date |

+-------------+

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-14 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

| 2018-08-15 |

+-------------+

10 rows in set (0.00 sec)
4

From this, we can see that there is still a bottleneck in the use of rowid after it is self-increased. Of course, this probability is very low, and the value that needs to be self-increased to 281 trillion yuan, which is a quite huge value. Functionally speaking, it is more reasonable to throw an error of writing duplicate values.

With the primary key, it seems that the bottleneck above does not exist.

> > > > References

rowid debugging refers to Dinch's blog

https://www.ofstack.com/article/172262.htm

Summarize


Related articles: