Detailed explanation of 30 mysql tens of millions of big data SQL query optimization skills

  • 2021-10-16 05:18:08
  • OfStack

1. To optimize the query, try to avoid full table scanning. First, consider indexing the columns involved in where and order by.

2. Try to avoid judging the null value of the field in the where clause, otherwise it will cause the engine to give up using the index and scan the whole table. For example, select id from t where num is null can set the default value of 0 on num to ensure that there is no null value in the num column in the table, and then query like this: select id from t where num=0

3. Try to avoid using it in where clause! = or < > Operator, otherwise the engine will give up using indexes and scan the whole table.

4. Try to avoid using or to join conditions in the where clause, otherwise it will cause the engine to give up using indexes and scan the full table, for example: select id from t where num=10 or num=20 You can query this: select id t where num=10 union all select id from from from t t where num=20

5. in and not in should also be used with caution, otherwise it will lead to full table scanning, such as select id from t where num in (1, 2, 3) For continuous values, do not use in if you can use between: select id from t where num between 1 and 3

6. The following query will also result in a full table scan: select id from t where ES90like '% Li%' To improve efficiency, consider full-text retrieval.

7. Using parameters in the where clause also results in a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer access plan selection until runtime; It must be selected at compile time. However, if the access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input to the index selection. A full table scan is performed as follows: select id from t where num = @ num can be changed to force the query to use an index: select id from t with (index (index name)) where num = @ num

8. Try to avoid expression operations on fields in the where clause, which will cause the engine to give up using indexes and scan the full table. For example: select id from t where num/2=100 should read: select id from t where num=100*2

9. Functional operations on fields in the where clause should be avoided as much as possible, which will cause the engine to give up using indexes and scan the full table. For example: select id from t where substring (name, 1, 3) = 'abc', name id beginning with abc should be changed to:
select id from t where name like 'abc%'

10. Do not perform functions, arithmetic operations, or other expression operations to the left of "=" in the where clause, or the system may not use indexes correctly.

11. When using index fields as conditions, if the index is a composite index, the first field in the index must be used as conditions to ensure that the system uses the index, otherwise the index will not be used, and the order of fields should be as consistent as possible with the order of indexes.

12. Don't write meaningless queries, such as generating an empty table structure: select col1, col2 into # t from t where 1=0
This type of code will not return any result set, but it will consume system resources, so it should be changed to this:
create table # t (…)

13. In many cases, replacing in with exists is a good choice: select num from a where num in (select num from b)
Replace with the following statement:
select num from a where exists(select 1 from b where num=a.num)

14. Not all indexes are effective for query. SQL is optimized according to the data in the table. When there are a lot of duplicate data in the index column, SQL query may not use the index. For example, there are almost 1.5 fields sex, male and female in table 1, so even if the index is built on sex, it will not play a role in query efficiency.

15. The more indexes, the better. Although indexes can improve the efficiency of the corresponding select, they also reduce the efficiency of insert and update. Because it is possible to rebuild indexes when insert or update, how to build indexes needs careful consideration, depending on the specific situation. The index number of 1 table should not exceed 6. If there are too many indexes, we should consider whether it is necessary to build indexes on columns that are not often used.

16. Updating clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records, and once the value of this column changes, the order of the whole table records will be adjusted, which will consume considerable resources. If the application system needs to update the data columns of clustered index frequently, it is necessary to consider whether the index should be built as clustered index.

17. Try to use numeric fields. If the fields containing only numeric information should not be designed as character fields, this will reduce the performance of queries and joins and increase the storage overhead. This is because the engine compares every 1 character in the string one by one when processing queries and joins, whereas for numeric types, only 1 comparison is enough.

18. Use varchar/nvarchar instead of char/nchar as much as possible, because firstly, the storage space of variable-length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.

19. Do not use select * from t anywhere, replace "*" with a specific list of fields, and do not return any fields that are not needed.

20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, please note that the index is very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable, and using them properly can make some routines more efficient, for example, when you need to repeatedly refer to a data set in a large table or a common table. However, for a single event, it is best to use an exported table.

23. When creating a new temporary table, if the amount of data inserted once is large, select into can be used instead of create table to avoid causing a large number of log to improve the speed; If the amount of data is small, create table should be followed by insert in order to ease the resources of the system tables.

24. If temporary tables are used, be sure to explicitly delete all temporary tables at the end of the stored procedure, first truncate table, then drop table, so as to avoid long-term locking of system tables.

25. Try to avoid cursors because they are inefficient. If the cursor manipulates more than 10,000 rows of data, you should consider overwriting them.

26. Before using a cursor-based approach or a temporary table approach, look for a set-based solution to solve the problem, which is usually more effective.

27. As in Temporary Table 1, cursors are not unusable. Using the FAST_FORWARD cursor for small datasets is generally superior to other row-by-row methods, especially when several tables must be referenced to get the data you need. Routines that include "totals" in the result set are usually faster than those that use cursors. If development time permits, both cursor-based and set-based approaches can be tried once to see which one works better.

28. Set SET NOCOUNT ON at the beginning and SET NOCOUNT OFF at the end of all stored procedures and triggers. There is no need to send an DONE_IN_PROC message to the client after each statement of the stored procedure and trigger is executed.

29. Try to avoid large transaction operations and improve system concurrency.

30. Try to avoid returning a large amount of data to the client. If the amount of data is too large, we should consider whether the corresponding requirements are reasonable.

If your program can meet all these 30 items, your program execution efficiency will be greatly improved


Related articles: