Detailed Explanation of count without Filter Condition in MySQL

  • 2021-11-13 03:00:15
  • OfStack

count(*)

Realization

1. MyISAM: The total number of rows of the table is stored on the disk, and the query without filtering conditions can be returned directly

If there are filtered count (*), MyISAM will not return quickly

2. InnoDB: Read data line by line from the storage engine 1, and then accumulate the count

Because of MVCC, it is uncertain how many rows InnoDB should return at the same time

Sample

Suppose the table t has 10,000 records

session A session B session C
BEGIN;
SELECT COUNT(*) FROM t;(返回10000)
INSERT INTO t;(插入1行)
BEGIN;
INSERT INTO t(插入1行);
SELECT COUNT(*) FROM t;(返回10000) SELECT COUNT(*) FROM t;(返回10002) SELECT COUNT(*) FROM T;(返回10001)

At the last moment, three sessions query the total number of rows of t at the same time, but the results are different

The default transaction isolation level for InnoDB is RR, implemented through MVCC

Every transaction needs to determine whether every 1 row of records is visible to itself

Optimization

1. InnoDB is the index organization table

Clustered index tree: Leaf nodes are data Level 2 index tree: Leaf node is primary key value

2, 2-level index trees take up much less space than clustered index trees

3. The optimizer will traverse the smallest index tree and minimize the amount of scanned data on the premise of ensuring correct logic

For count operation without filtering conditions, no matter which index tree is traversed, the effect is 1 The optimizer selects the optimal index tree for count (*)

show table status


mysql> SHOW TABLE STATUS\G;
*************************** 1. row ***************************
 Name: t
 Engine: InnoDB
 Version: 10
 Row_format: Dynamic
 Rows: 100256
 Avg_row_length: 47
 Data_length: 4734976
Max_data_length: 0
 Index_length: 5275648
 Data_free: 0
 Auto_increment: NULL
 Create_time: 2019-02-01 17:49:07
 Update_time: NULL
 Check_time: NULL
 Collation: utf8_general_ci
 Checksum: NULL
 Create_options:
 Comment:

SHOW TABLE STATUS is also estimated by sampling (very inaccurate), and the error can reach 40% ~ 50%

Maintenance count

Cache

Program

Save the total number of rows in the table with Redis (without filtering conditions) For every row inserted in this table, Redis counts +1, and for every row deleted, Redis counts-1

Disadvantages

Missing update

1. Redis may lose updates

2. Solution: After Redis is restarted abnormally, execute count (*) once in the database

Abnormal restart is not common, and the cost of full table scanning is acceptable at this time

Logic imprecision is fatal

1. Scenario: Display the total number of operation records and 100 records of recent operations

2. Redis and MySQL are two different storage systems, which do not support distributed transactions, so it is impossible to get an accurate 1-dimensional view

Time series A

session B at the time of T3, the 100 lines of results found have the latest inserted records, but Redis has not +1, and the logic is not 1

时刻 session A session B
T1
T2 插入1行数据R;
T3 读取Redis计数;
查询最近100条记录;
T4 Redis计数+1;

Time series B

session B at the time of T3, there is no newly inserted record in the 100 lines of results found, but Redis has +1, and the logic is not 1

时刻 session A session B
T1
T2 Redis计数+1;
T3 读取Redis计数;
查询最近100条记录;
T4 插入1行数据R;

Database

Put the count value in a separate count table C in the database Using the characteristics of crash-safe of InnoDB, the problem of crash loss is solved By using the transaction support of InnoDB, the problem of 1-dimensional view is solved session B at T3 time, the transaction of session A has not been committed, the count value of table C +1 is invisible to itself, and logic 1 causes

时刻 session A session B
T1
T2 BEGIN;
表C中的计数值+1;
T3 BEGIN;
读表C计数值;
查询最新100条记录;
COMMIT;
T4 插入1行数据R;
COMMIT;

Performance of count

Semantics

1. count () is an aggregate function, which judges the returned result set line by line

If the parameter value of count function is not NULL, the cumulative value is +1, otherwise, it is not added, and finally the cumulative value is returned

2. count (Field F)

The field F may be NULL Represents the total number of fields F not NULL in the result set that meet the conditions returned

3. count (primary key ID), count (1), count (*)

Can't be NULL Represents the total number of result sets that meet the conditions returned

4. The InnoDB engine returns whatever field the Server layer wants

With the exception of count (*), the whole line is not returned, only blank lines are returned

Performance comparison

count (Field F)

1. If the field F is not allowed to be NULL, read this field from the record line by line, and accumulate it line by line after passing the judgment

It is impossible to judge whether the field is NULL by the table structure

2. If the field F is defined as allowing NULL, the field is read out from the record line by line, and accumulated according to the line after the judgment is passed

It is possible to judge whether the field is NULL by the table structure Determine whether the field value is actually NULL

3. If there is no level 2 index on the field F, you can only traverse the whole table (clustered index)

4. Because InnoDB must return the field F, the optimizer can make fewer optimization decisions

For example, you can't choose the best index to traverse

count (primary key ID)

InnoDB traverses the whole table (clustered index), fetches out the id value of each row and returns it to the Server layer After id is acquired by Server layer, it is judged that it is impossible to be NULL, and then accumulated according to rows The optimizer may choose the best index to traverse

count(1)

The InnoDB engine traverses the entire table (clustered index), but does not take values The Server layer puts a number 1 into every row returned, judges that it is impossible to be NULL, and accumulates it according to rows count (1) is faster than count (primary key ID) because count (primary key ID) involves two-part operations Parsing data rows Copy field value

count(*)

count (*) doesn't take out all the values, but it is specially optimized to take no values, because "*" is definitely not NULL, and it is accumulated by rows No value: InnoDB returns a blank line, telling Server that the layer is not NULL and can be counted

Efficiency ranking

count (Field F) < count (primary key ID) < count (1) ≈ count (*) Try to use count (*)

Sample


mysql> SHOW CREATE TABLE prop_action_batch_reward\G;
*************************** 1. row ***************************
 Table: prop_action_batch_reward
Create Table: CREATE TABLE `prop_action_batch_reward` (
 `id` bigint(20) NOT NULL,
 `source` int(11) DEFAULT NULL,
 `serial_id` bigint(20) NOT NULL,
 `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `user_ids` mediumtext,
 `serial_index` tinyint(4) DEFAULT '0',
 PRIMARY KEY (`id`),
 UNIQUE KEY `uniq_serial_id_source_index` (`serial_id`,`source`,`serial_index`),
 KEY `idx_create_time` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

count (Field F)

No index

There is no index on user_ids, and InnoDB must return the user_ids field, only the clustered index can be traversed


mysql> EXPLAIN SELECT COUNT(user_ids) FROM prop_action_batch_reward;
+----+-------------+--------------------------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table   | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------------+------+---------------+------+---------+------+----------+-------+
| 1 | SIMPLE | prop_action_batch_reward | ALL | NULL  | NULL | NULL | NULL | 16435876 | NULL |
+----+-------------+--------------------------+------+---------------+------+---------+------+----------+-------+

mysql> SELECT COUNT(user_ids) FROM prop_action_batch_reward;
+-----------------+
| count(user_ids) |
+-----------------+
| 17689788 |
+-----------------+
1 row in set (10.93 sec)

Have an index

1. There is an index on serial_id, which can traverse uniq_serial_id_source_index

2. However, since InnoDB must return the serial_id field, it does not traverse the better choice idx_create_time with equivalent logical results

If you select idx_create_time and return the serial_id field, this means you must return to the table

mysql> EXPLAIN SELECT COUNT(serial_id) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------------------+---------+------+----------+-------------+
| id | select_type | table   | type | possible_keys | key    | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL  | uniq_serial_id_source_index | 15 | NULL | 16434890 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------------------+---------+------+----------+-------------+

mysql> SELECT COUNT(serial_id) FROM prop_action_batch_reward;
+------------------+
| count(serial_id) |
+------------------+
|  17705069 |
+------------------+
1 row in set (5.04 sec)

count (primary key ID)

The optimizer selects the best index idx_create_time to traverse rather than cluster the index


mysql> EXPLAIN SELECT COUNT(id) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL  | idx_create_time | 5 | NULL | 16436797 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(id) FROM prop_action_batch_reward;
+-----------+
| count(id) |
+-----------+
| 17705383 |
+-----------+
1 row in set (4.54 sec)

count(1)


mysql> EXPLAIN SELECT COUNT(1) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL  | idx_create_time | 5 | NULL | 16437220 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(1) FROM prop_action_batch_reward;
+----------+
| count(1) |
+----------+
| 17705808 |
+----------+
1 row in set (4.12 sec)

count(*)


mysql> EXPLAIN SELECT COUNT(*) FROM prop_action_batch_reward;
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref | rows | Extra |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+
| 1 | SIMPLE | prop_action_batch_reward | index | NULL  | idx_create_time | 5 | NULL | 16437518 | Using index |
+----+-------------+--------------------------+-------+---------------+-----------------+---------+------+----------+-------------+

mysql> SELECT COUNT(*) FROM prop_action_batch_reward;
+----------+
| count(*) |
+----------+
| 17706074 |
+----------+
1 row in set (4.06 sec)

References

[MySQL Practical 45 Lectures]

Summarize


Related articles: