Simple rewrite optimization for MySQL subquery

  • 2020-09-28 09:12:00
  • OfStack

DBA used oracle or other relational databases or developers have such experience, both in the subquery that optimized database already, can very good choice driver table, and then on the transplant the experience to mysql database, but unfortunately, mysql on the processing of the subquery may make you disappointed, in our production system is due to encounter this problem:


select i_id, sum(i_sell) as i_sell

from table_data

where i_id in (select i_id from table_data where Gmt_create >= '2011-10-07 00:00:00 ' )

group by i_id;

(Note: The business logic of sql can be likened to: first, query the sales volume of the new 100 books sold in 10-07, and then query the sales volume of the new 100 books sold in the whole year.)

The performance issue with this sql is due to the weakness of the mysql optimizer in handling subqueries, which the mysql optimizer will rewrite when processing subqueries. Usually, we want to finish the results of the subquery from the inside out, and then use the subquery to drive the table of the external query to complete the query; However, mysql will first scan all the data in the outer table, and each data will be passed to the subquery and associated with the subquery. If the appearance is large, then the performance will be a problem.

For the above query, as the data of table_data table contains 70W data, and there are many data in the subquery, which are repetitive, so nearly 70W need to be associated. A large number of associations lead to the execution of this sql for several hours, but the execution is not completed. Therefore, we need to rewrite sql:


SELECT t2.i_id, SUM(t2.i_sell) AS sold

FROM (SELECT distinct i_id FROM table_data

WHERE gmt_create >= '2011-10-07 00:00:00 ' ) t1, table_data t2

WHERE t1.i_id = t2.i_id GROUP BY t2.i_id;

We changed the subquery to association, and added distinct to the subquery to reduce the number of t1 associating t2.

After modification, the execution time of sql was reduced to less than 100ms.


Related articles: