Eight Common Error Usage Examples of SQL in MySQL

  • 2021-12-11 19:15:56
  • OfStack

Preface

MySQL still maintained a strong trend of database popularity growth in 2016. More and more customers build their applications on MySQL database, and even migrate from Oracle to MySQL. However, some customers encounter some situations such as slow response time and full CPU in the process of using MySQL database.

Alibaba Cloud RDS expert service team has helped customers solve many urgent problems on the cloud. Now, some common SQL problems in ApsaraDB Expert Diagnosis Report are summarized as follows for your reference.

1. LIMIT statement

Paged queries are one of the most common scenarios, but they are also usually the most problematic places.

For example, for the following simple statement, the idea of DBA is to add a combined index to the type, name, create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.


SELECT * 
FROM operation 
WHERE type = 'SQLStats' 
  AND name = 'SlowLog' 
ORDER BY create_time 
LIMIT 1000, 10;

Well, maybe more than 90% of DBA solves this problem.

But when the LIMIT clause becomes "LIMIT 1000000, 10", programmers still complain: Why is it still slow for me to take only 10 records?

To know that the database does not know where the 10000000 record starts, even if there is an index, it needs to be calculated from scratch. In most cases, programmers are lazy when this kind of performance problem occurs.

In front-end data browsing and page turning, or big data export in batches, the maximum value of the previous page can be taken as a parameter as a query condition. SQL is redesigned as follows:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

Under the new design, the query time is basically fixed and will not change with the increase of data volume.

2. Implicit conversion

Another common error is the mismatch between query variables and field definition types in the SQL statement. For example, the following statement:


mysql> explain extended SELECT * 
  > FROM my_balance b 
  > WHERE b.bpn = 14000000123 
  >  AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

The field bpn is defined as varchar (20), and the strategy of MySQL is to convert strings into numbers and then compare them. The function acts on the table field, and the index is invalid.

The above situation may be the parameters automatically filled in by the application framework, rather than the programmer's original intention. Nowadays, there are many complicated application frameworks, which are convenient to use, and at the same time, be careful that it may dig holes for itself.

3. Update and delete the association

Although MySQL 5.6 introduces materialization, it is important to note that it is currently only for query statement optimization. You need to manually rewrite JOIN for updates or deletions.

For example, in the following UPDATE statement, MySQL actually executes a loop/nested subquery (DEPENDENT SUBQUERY), and its execution time can be imagined.


UPDATE operation o 
SET status = 'applying' 
WHERE o.id IN (SELECT id 
    FROM (SELECT o.id, 
        o.status 
      FROM operation o 
      WHERE o.group = 123 
        AND o.status NOT IN ( 'done' ) 
      ORDER BY o.parent, 
         o.id 
      LIMIT 1) t);

Implementation plan:


+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type  | table | type | possible_keys | key  | key_len | ref | rows | Extra            |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY   | o  | index |    | PRIMARY | 8  |  | 24 | Using where; Using temporary      |
| 2 | DEPENDENT SUBQUERY |  |  |    |   |   |  |  | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED   | o  | ref | idx_2,idx_5 | idx_5 | 8  | const | 1 | Using where; Using filesort       |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After being rewritten to JOIN, the selection mode of subquery changed from DEPENDENT SUBQUERY to DERIVED, and the execution speed was greatly accelerated, from 7 seconds to 2 milliseconds.


UPDATE operation o 
  JOIN (SELECT o.id, 
       o.status 
      FROM operation o 
      WHERE o.group = 123 
       AND o.status NOT IN ( 'done' ) 
      ORDER BY o.parent, 
        o.id 
      LIMIT 1) t
   ON o.id = t.id 
SET status = 'applying' 

The execution plan is simplified to


+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra            |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY  |  |  |    |  |   |  |  | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED  | o  | ref | idx_2,idx_5 | idx_5 | 8  | const | 1 | Using where; Using filesort       |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

4. Mixed sorting

MySQL cannot use indexes for mixed sorting. However, in some scenarios, there are opportunities to use special methods to improve performance.


SELECT * 
FROM my_order o 
  INNER JOIN my_appraise a ON a.orderid = o.id 
ORDER BY a.is_reply ASC, 
   a.appraise_time DESC 
LIMIT 0, 20 

The execution plan is displayed as a full table scan:


+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type | possible_keys  | key  | key_len | ref  | rows | Extra 
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| 1 | SIMPLE  | a  | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |
| 1 | SIMPLE  | o  | eq_ref | PRIMARY  | PRIMARY | 122  | a.orderid |  1 | NULL   |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Since is_reply has only two states of 0 and 1, the execution time is reduced from 1.58 seconds to 2 milliseconds after we rewrite it as follows.


SELECT * 
FROM ((SELECT *
   FROM my_order o 
    INNER JOIN my_appraise a 
      ON a.orderid = o.id 
       AND is_reply = 0 
   ORDER BY appraise_time DESC 
   LIMIT 0, 20) 
  UNION ALL 
  (SELECT *
   FROM my_order o 
    INNER JOIN my_appraise a 
      ON a.orderid = o.id 
       AND is_reply = 1 
   ORDER BY appraise_time DESC 
   LIMIT 0, 20)) t 
ORDER BY is_reply ASC, 
   appraisetime DESC 
LIMIT 20;

5. EXISTS statement

MySQL still executes the EXISTS clause with nested subqueries. Such as the following SQL statement:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
0

The implementation plan is:


+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type  | table | type | possible_keys  | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
| 1 | PRIMARY   | n  | ALL | | NULL  | NULL | NULL | 1086041 | Using where     |
| 1 | PRIMARY   | sra | ref | | idx_user_id | 123  | const |  1 | Using where   |
| 2 | DEPENDENT SUBQUERY | m  | ref | | idx_message_info | 122  | const |  1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Removing exists and changing it to join can avoid nesting subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.


SELECT *
FROM my_neighbor n 
  INNER JOIN message_info m 
    ON n.id = m.neighbor_id 
     AND m.inuser = 'xxx' 
  LEFT JOIN my_neighbor_apply sra 
    ON n.id = sra.neighbor_id 
     AND sra.user_id = 'xxx' 
WHERE n.topic_status < 4 
  AND n.topic_type <> 5 

New implementation plan:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
3

6. Conditional push-down

External query criteria cannot be pushed down to complex views or subqueries:

Aggregate subquery; Subquery with LIMIT; UNION or UNION ALL subquery; Subquery in the output field;

As in the following statement, you can see from the execution plan that its conditions act after aggregating subqueries:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
4

SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
5

Make sure that the query condition can be pushed down directly semantically, and rewrite as follows:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
6

The implementation plan becomes:


+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

For a detailed explanation that the external conditions of MySQL cannot be pushed down, please refer to the article: http://mysql.taobao.org/monthly/2016/07/08

7. Narrow the scope in advance

Start with the initial SQL statement:


SELECT * 
FROM  my_order o 
    LEFT JOIN my_userinfo u 
       ON o.uid = u.uid
    LEFT JOIN my_productinfo p 
       ON o.pid = p.pid 
WHERE ( o.display = 0 ) 
    AND ( o.ostaus = 1 ) 
ORDER BY o.selltime DESC 
LIMIT 0, 15 

The original meaning of the SQL statement is to make a series of left joins first, and then sort the first 15 records. It can also be seen from the execution plan that the last step estimated the number of sorting records to be 900,000, and the time consumption was 12 seconds.


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;
9

Because the final WHERE condition and sort are for the leftmost main table, you can reduce the data amount in advance for my_order sort before making left join. After the SQL is rewritten as follows, the execution time is reduced to about 1 millisecond.


SELECT * 
FROM (
SELECT * 
FROM  my_order o 
WHERE ( o.display = 0 ) 
    AND ( o.ostaus = 1 ) 
ORDER BY o.selltime DESC 
LIMIT 0, 15
) o 
   LEFT JOIN my_userinfo u 
       ON o.uid = u.uid 
   LEFT JOIN my_productinfo p 
       ON o.pid = p.pid 
ORDER BY o.selltime DESC
limit 0, 15

Check the execution plan again: Subquery materialized (select_type=DERIVED) participates in JOIN. Although the estimated row scan is still 900,000, the actual execution time becomes very small after using the index and LIMIT clause.


+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table   | type  | possible_keys | key   | key_len | ref  | rows  | Extra                       |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| 1 | PRIMARY   | <derived2> | ALL  | NULL     | NULL  | NULL  | NULL |   15 | Using temporary; Using filesort          |
| 1 | PRIMARY   | u     | eq_ref | PRIMARY    | PRIMARY | 4    | o.uid |   1 | NULL                        |
| 1 | PRIMARY   | p     | ALL  | PRIMARY    | NULL  | NULL  | NULL |   6 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED   | o     | index | NULL     | idx_1  | 5    | NULL | 909112 | Using where                    |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

8. Push down the intermediate result set

Let's look at the following example that has been preliminarily optimized (the main table in the left join takes precedence over the query criteria):


SELECT  a.*, 
     c.allocated 
FROM   ( 
       SELECT  resourceid 
       FROM   my_distribute d 
          WHERE  isdelete = 0 
          AND   cusmanagercode = '1234567' 
          ORDER BY salecode limit 20) a 
LEFT JOIN 
     ( 
       SELECT  resourcesid ,  sum(ifnull(allocation, 0) * 12345) allocated 
       FROM   my_resources 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

So are there any other problems with this statement? It is not difficult to see that the subquery c is a full table aggregation query, which will lead to the performance degradation of the whole statement when the number of tables is particularly large.

In fact, for the subquery c, the final result set of the left join only cares about the data that can match the main table resourceid. Therefore, we can rewrite the statement as follows, and the execution time is reduced from 2 seconds to 2 milliseconds.


SELECT  a.*, 
     c.allocated 
FROM   ( 
          SELECT  resourceid 
          FROM   my_distribute d 
          WHERE  isdelete = 0 
          AND   cusmanagercode = '1234567' 
          ORDER BY salecode limit 20) a 
LEFT JOIN 
     ( 
          SELECT  resourcesid ,  sum(ifnull(allocation, 0) * 12345) allocated 
          FROM   my_resources r, 
              ( 
                   SELECT  resourceid 
                   FROM   my_distribute d 
                   WHERE  isdelete = 0 
                   AND   cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
          WHERE  r.resourcesid = a.resourcesid 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

But the subquery a appears several times in our SQL statement. This writing method not only has extra overhead, but also makes the whole statement complicated. Use the WITH statement to override again:


WITH a AS 
( 
     SELECT  resourceid 
     FROM   my_distribute d 
     WHERE  isdelete = 0 
     AND   cusmanagercode = '1234567' 
     ORDER BY salecode limit 20)
SELECT  a.*, 
     c.allocated 
FROM   a 
LEFT JOIN 
     ( 
          SELECT  resourcesid ,  sum(ifnull(allocation, 0) * 12345) allocated 
          FROM   my_resources r, 
              a 
          WHERE  r.resourcesid = a.resourcesid 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

Summarize

The database compiler generates the execution plan, which determines the actual execution mode of SQL. But the compiler only tries its best to serve, and the compiler of all databases is not perfect.

Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of database compiler can we avoid its shortcomings and write high-performance SQL statements.

Programmers should bring the idea or consciousness of algorithm in when designing data model and writing SQL statement.

When writing complex SQL statements, you should get into the habit of using WITH statements. Simple and clear SQL statements can also reduce the burden on the database.


Related articles: