In MySQL indexes optimize distinct statements and distinct's multi field operations

2020-12-16 06:09:46
OfStack

MySQL typically uses GROUPBY(essentially a sort action) to perform DISTINCT operations, and if the DISTINCT and ORDERBY operations are combined, temporary tables are usually used. This can affect performance. In some cases,MySQL can use indexes to optimize DISTINCT operations, but you need to learn from them. This article covers an example of an DISTINCT operation that cannot be performed using an index.

Example 1 optimizes the DISTINCT operation with indexes


create table m11 (a int, b int, c int, d int, primary key(a)) engine=INNODB;

insert into m11 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);

explain select distinct(a) from m11;


mysql> explain select distinct(a) from m11;


+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+| 1 | SIMPLE | m11 | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 1 | 100.00 | Using index |+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Description:
There is a primary key index on the 1 'a' column, and MySQL can use the index (the value of the key column indicates the use of a primary key index) to complete the DISTINCT operation.

This is a typical example of using indexes to optimize DISTINCT operations.

Example 2 cannot optimize DISTINCT operations with indexes


create table m31 (a int, b int, c int, d int, primary key(a)) engine=MEMORY;

insert into m31 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);

explain select distinct(a) from m31;


 mysql> explain select distinct(a) from m31;




+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+| 1 | SIMPLE | m31 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | NULL |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+

Description:
1 From the query execution plan, the index is not being used.

2 Compared with the construction sentence of example 1, only the storage engine is different.

3 Why primary key indexes don't work ? Is the index on the MEMORY storage engine not available ?

Example 3 uses indexes to optimize the Memory table for DISTINCT operations


create table m33 (a int, b int, c int, d int, INDEX USING BTREE (a)) engine=MEMORY;

insert into m33 values (1,1,1,1),(2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8);

explain select distinct(a) from m33;


 mysql> explain select distinct(a) from m33;

+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+| 1 | SIMPLE | m33 | NULL | index | NULL | a | 5 | NULL | 8 | 100.00 | NULL |+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+
Description:
There is a primary key index on the 1 'a' column, and MySQL can use the index (the value of the key column indicates the use of a primary key index) to complete the DISTINCT operation.

Comparison of example 2 shows that both use the Memory engine, but instance 3 uses the index of type Btree by name.

Example 2 does not specify what type of index to use and MySQL will use the default values. The MySQL manual says:

As indicated by the engine name, MEMORY tables are stored in memory. They use hash indexes by default, which makes them very fast for single-value lookups, and very useful for creating temporary tables.

Conclusion:

1 To see the impact of the index on the query, pay attention to the type of index.

2 The HASH index is good for equivalent lookup, but not for scenes that require order, whereas Btree is good for ordered scenes.

3. Look at the query execution plan and find that the index is not being used, so we need to take a step to investigate the type of index.

DISTINCT cannot select multiple field solutions
In practice, we often select duplicate data in a table in the database, usually using the DISTINCT function.

But DISTINCT is only valid for one field, such as:


sql="select DISTINCT title from Table where id>0"

When we need to list another column in the data, such as:


sql="select DISTINCT title,posttime from Table where id>0"

The result is not what we want, so we need to solve the problem in another way.

Here is my SQL statement. I don't know if it's good, but I wish someone better would share it with me.

Method 1:


sql = "Select DISTINCT(title),posttime From Table1 Where id>0"

Method 2:


sql = "Select title,posttime From Table1 Where id>0 group by title,posttime"

Method 3:


mysql> explain select distinct(a) from m11;