SQL optimization techniques to improve database performance

  • 2020-05-12 06:21:49
  • OfStack

The optimization goal

1. Reduce the number of IO

IO is always the most vulnerable place for database bottleneck, which is determined by the responsibility of the database. In most database operations, more than 90% of the time is occupied by IO operations. Reducing the number of IO is the first priority in SQL optimization, and of course, it is also the most effective optimization means.

2. Reduce the CPU calculation

In addition to the IO bottleneck, the optimization of the SQL workload needs to be considered in the SQL optimization. order by, group by,distinct... Both are big consumers of CPU (these operations are basically CPU processing in-memory data comparison operations). When our IO optimization reaches a certain stage, reducing the CPU calculation becomes an important goal of our SQL optimization

An optimization method

1. Change the SQL execution plan

Having defined the optimization goals, we need to determine how to achieve our goals. As for the SQL statement, there is only one way to achieve the above two goals. That is to change the execution plan of SQL so that it can "avoid detours" as much as possible and find the data we need through various "shortcuts" so as to achieve the goals of "reducing the number of IO" and "reducing the calculation of CPU"

Common misconceptions about

1. count(1) and count(primary_key) are better than count(*)

Many people use count(1) and count(primary_key) instead of count(*) in order to count the number of records. They think this is better, but this is a myth. For some scenarios, this may be worse, because the database has made some special optimizations for the count(*) count operation.

2. count(column) and count(*) are the same

This myth is common even among senior engineers or DBA, and many people take it for granted. In fact, count(column) and count(*) are completely different operations and represent completely different meanings.

count(column) is a record of how many column fields in the result set are not empty

count(*) is how many records there are in the entire result set

3. select a,b from... select a,b,c from... You can make the database access less data

This myth exists mainly among a large number of developers, mainly because they do not know much about how databases are stored.

In fact, most relational databases are stored as rows (row), and data access operations are performed in a fixed-size IO unit (known as block or page), usually 4KB, 8KB... Most of the time, multiple rows are stored in each IO cell, and each row stores all the fields of that row (except for fields of special types such as lob).

So, whether we take 1 field or multiple fields, the amount of data that the database actually needs to access in the table is actually 1.

Of course, there is an exception, that is, our query can be completed in the index, that is, when we only take a,b two fields, we do not need to go back to the table, and c is not in the index, we need to go back to the table to get its data. In this case, the amount of IO of the two will have a big difference.

4. order by 1 definitely needs sorting operation

We know that the index data is actually an orderly, if we need the order of the data and some index 1, and our query execution by the index, the database will omit the sorting operation 1, and bring the data back directly, because the database know sorting data already meet our requirements.

In fact, using indexes to optimize SQL with sorting requirements is a very important optimization tool

Further reading: the implementation analysis of MySQL ORDER BY, the basic implementation principle of GROUP BY in MySQL and the basic implementation principle of MySQL DISTINCT

5. If filesort is included in the execution plan, it will sort the disk files

We're not to blame for this, but the developers of MySQL have a problem with wording. filesort is the information that we might see in the "Extra" column 1 when viewing the execution plan of an SQL using the explain command.

In fact, whenever an SQL statement needs to be sorted, "Using filesort" is displayed, which does not mean that there will be a file sort operation.

Read more: understand filesort in the MySQL Explain command output, which I describe in more detail here

The basic principle of

1. join as little as possible

The strength of MySQL lies in its simplicity, but in some ways this is also its weakness. The MySQL optimizer is efficient, but due to the limited amount of statistical information, there is more possibility of deviation in the optimizer's working process. For complex multi-table Join, due to its limited optimizer and insufficient efforts in Join, the performance of Oracle is still far from that of its relational predecessors. But for simple single-table queries, the difference is small and in some cases better than the database predecessors.

2. Minimize sorting

The sorting operation consumes a large amount of CPU resources, so reducing the sorting can greatly affect the response time of SQL in the case of high cache hit ratio and sufficient capability of IO.

For MySQL, there are several ways to reduce sorting, such as:

The above myth is to optimize by using the index to sort

Reduce the number of rows participating in the sort

It is not necessary to sort the data

...

3. Try to avoid select *

Many people see this point and find it hard to understand. Isn't it a myth that the number of fields in the select clause does not affect the data read?

Yes, IO is not affected most of the time, but when we still have the order by operation, the fields in the select clause will largely affect our sorting efficiency. This point can be explained in more detail in my previous article introducing the implementation analysis of MySQL ORDER BY.

In addition, the above misunderstanding does not also say, but most of the time it will not affect the amount of IO, when our query results only need to be found in the index, it will greatly reduce the amount of IO.

4. Use join instead of subqueries

While Join's performance is poor, it still offers a significant performance advantage over MySQL's subqueries. MySQL's subquery execution plan 1 has a big problem. Although this problem has existed for many years, it is common in all stable versions that have been released so far. Although officials have long acknowledged the problem and promised to fix it as soon as possible, so far at least we haven't seen a single release that does a good job of fixing it.

5. or as little as possible

When multiple conditions exist in where clause to "or" coexist, MySQL optimizer is not well solve the problem of executing the plan optimization, plus MySQL SQL and Storage layered architecture mode, caused its performance is low, most of the time use union all or union (when necessary) to replace the "or" will get better results.

6. Try to use union all instead of union

The main difference between union and union is that union needs to combine two (or more) results together before performing a unique filtering operation, which involves sorting, adding a lot of CPU operations, and increasing resource consumption and delay. So when we can confirm that a duplicate result set is not possible or do not care to duplicate the result set, try to use union all instead of union.

7. Filter as early as possible

This 1 optimization strategy is most commonly used in the optimization design of an index (putting the more filtered fields closer to the front).

The same principle can be used in SQL programming to optimize the SQL of Join. For example, when we page through multiple tables for data query, it is better for us to filter the data into pages on one table and then use the result set of pages into another table Join. In this way, we can reduce the unnecessary operation of IO as much as possible and greatly save the time consumed by the operation of IO.

8. Avoid type conversions

Here, the "type conversion" refers to the type conversion that occurs when the type of the column field in the where clause is different from the type of the parameter passed in:

The conversion is performed manually on column_name via the conversion function

As a direct result, MySQL(and indeed other databases have the same problem) cannot use the index, and if you must convert, you should do it on the parameters passed in

The database does the transformation itself

If we incoming data types and the field type 1, at the same time, we have nothing to do any type conversion processing, MySQL may themselves on our data type conversion operations, or may not be processed and to the storage engine to deal with, so 1, will appear the situation caused the execution plan index cannot be used.

9. Prioritize high-concurrency SQL over some "big" SQL with low execution frequency

High concurrency SQL will always be more disruptive than low frequency SQL, because high concurrency SQL 1 will break down the system without even giving us a chance to breathe. For some SQL, which needs to consume a large amount of IO and has a slow response, due to its low frequency, even if it is encountered, the response of the whole system will be slow by 1 point at most, but it may last for at least a while, so that we have the opportunity to buffer.

10, from the overall optimization, rather than one-sided adjustment

The optimization of SQL should not be carried out for a single one, but should take full account of all the SQL in the system. In particular, when optimizing the execution plan of SQL by adjusting the index, one should never lose one thing or lose another.

11. Try to explain every SQL running in the database

In order to optimize SQL, you need to know the implementation plan of SQL so that you can judge whether there is room for optimization and whether there is a problem with the implementation plan. After optimizing SQL running in the database for a period of time, it is obvious that SQL may have few problems, and most of them need to be explored. At this time, a large number of explain operations need to be collected and executed, and it is necessary to judge whether optimization is needed.

via IT168 technology

Related articles: