Improve data loading efficiency in MySQL

  • 2020-05-09 19:26:35
  • OfStack

Loading data into a database is relatively straightforward. However, there are also strategies that can be used to improve the efficiency of data loading operations, based on the following principles:

A batch load is faster than a single-row load because there is no need to refresh the index cache after each record is loaded. Can be refreshed after batch records are loaded.

Loading a table with no index is faster than loading after the index. If there is an index, not only must the records be added to the data file, but each index must be modified to reflect the new records added.

Shorter SQL statements are faster than longer SQL statements because they involve less analysis on the server side and because they are faster to send from the client to the server over the network. Some of these factors may seem trivial (especially the last one), but even small factors can make a big difference if a large amount of data is loaded. We can use the 1 principle above to derive several practical conclusions about how to load the data fastest:

LOAD DATA (in all its forms) is more efficient than INSERT because it loads the rows in batches. Index refreshes are rare, and the server only needs to analyze and interpret one statement instead of several.

LOAD DATA is more efficient than LOAD DATA LOCAL. With LOAD DATA, the file must be located on the server and must have FILE permissions, but the server can read the file directly from disk. With LOAD DATA LOCAL, the client reads the file and sends it to the server over the network, which is slow.

If you must use INSERT, you should use the form that allows you to specify multiple lines in a single statement, such as:

The more rows you can specify in a statement, the better. This reduces the number of statements required and the number of index refreshes. If you are using mysqldump to generate a database backup file, you should use the -- extended-insert option to make the dump file contain multiple lines of INSERT statements. You can also use -o p t (optimized), which enables the -extended-insert option. Instead, avoid the -- complete-insert option of mysqldump; This option results in single-line INSERT statements that take longer to execute and require more analysis than statements generated without the -- complete-insert option.

Use a compressed client/server protocol to reduce network data traffic. For most MySQL clients, you can specify it with the --compress command-line option. It is generally only used on slower networks because compression takes up a lot of processor time.

Let MySQL insert default values; Do not specify columns in the INSERT statement that you will assign default values in any way. On average, this statement is shorter and reduces the number of characters sent to the server over the network. In addition, statements contain fewer values and the server does less analysis and transformation.

If the table is indexed, you can use bulk inserts (LOAD DATA or multi-line INSERT statements) to reduce the index overhead. This minimizes the impact of index updates, since the index only needs to be refreshed when all rows are processed, rather than after each row is processed.

If you need to load a large amount of data into a new table, it is faster to create the table and load it when it is not indexed, and create the index after the data is loaded. It is faster to create the index once (instead of changing the index once per row).

If indexes are dropped or disabled before loading, re-creating or enabling indexes after loading the data may make loading faster. If you want to use a delete or disable strategy for data loading, do some experimentation to see if it's worth it (if you load a small amount of data into a large table, it can take longer to rebuild and index than to load the data).

Indexes can be deleted and rebuilt using DROP INDEX and CREATE INDEX. Another alternative is to disable and enable indexes using myisamchk or isamchk. This requires an account on the MySQL server host and write rights to the table files. To disable table indexing, enter the appropriate database directory and execute 1 of the following commands:

myisamchk is used for the MyISAM table for index files with the.MYI extension, and isamchk for the ISAM table for index files with the.ISM extension. After loading the data into the table, activate the index as follows:

If you decide to use index disabling and activation, you should use the table repair lock protocol described in chapter 13 to prevent the server from changing the lock at the same time (although the table is not being repaired at this point, it is modified like table repair process 1, so you need to use the same lock protocol).

The above data loading principle also applies to fixed queries related to clients that need to perform different operations. For example, 1 would like to avoid running SELECT queries for long periods of time on frequently updated tables. Running SELECT queries for long periods of time generates a lot of contention and degrades the performance of the writer. One possible solution is to store the records in a temporary table and then periodically add them to the main table if the writes are mostly INSERT operations. If new records need to be accessed immediately, this is not a viable method. But you can use this method as long as you don't have to access them for a short period of time. There are two benefits to using temporary tables. First, it reduces contention with SELECT queries on the main table and, as a result, performs faster. Secondly, the total time to load the records from the temporary table into the main table is less than the total time to load the records separately. The corresponding index cache only needs to be refreshed at the end of each batch load, not after each row is loaded. One application of this policy is to access the MySQL database by going to the Web page of the Web server. In this case, there may be no higher authority to guarantee that the record enters the main table immediately.

Another strategy to reduce index refreshes is to use the DELAYED_KEY_WRITE table creation option of MyISAM if the data is not exactly a single record inserted in the event of a system abnormal shutdown (which may occur if MySQL is used for some data entry work). This option allows the index cache to refresh only occasionally, rather than after each insert.

If you want to take advantage of delayed index refreshes across the server, simply start mysqld with the -- delayed-key-write option. In this case, block writes are delayed until the block must be refreshed to make room for other index values, or until an flush-tables command is executed, or until the index table is closed.

Related articles: