Optimize the storage efficiency of InnoDB table BLOB and TEXT columns

  • 2021-11-14 07:22:11
  • OfStack

First, introduce the following key points about the storage format of MySQL InnoDB engine:

1. InnoDB can choose to use shared tablespace or independent tablespace. It is recommended to use independent tablespace for easy management and maintenance. Enable innodb_file_per_table Option, after 5.5, it can be dynamically modified online and implemented ALTER TABLE xx ENGINE = InnoDB Convert an existing table to a standalone table space, earlier than version 5.5. After modifying this option, it needs to be restarted for it to take effect.

2. data page of InnoDB defaults to 16KB, and new options are added after version 5.6 innodb_page_size It can be modified. Before 5.6, you can only modify the source code and recompile, but it is not recommended to modify this configuration unless you are very clear about its advantages and disadvantages.

3. When new data is written in data page of InnoDB, 1/16 of the space will be reserved, and the reserved space can be used for subsequent new record writing, thus reducing the overhead of frequently adding data page.

4. Each data page needs to store at least 2 rows of records. Therefore, the maximum length of uplink record is 8KB in theory, but it should be smaller in fact, because there are still 1 internal data structure of InnoDB to store.

5. Limited by the storage mode of InnoDB, if the data is written sequentially, the filling rate of data page is 15/16 in the most ideal case, but there is no way to guarantee complete sequential writing, so the filling rate of data page is generally 1/2 to 15/16. Therefore, it is better for each InnoDB table to have one self-adding column as the primary key, so that new record writes are as sequential as possible.

6. When the filling rate of data and page is less than 1/2, InnoDB will shrink and free space will be released.

7. The InnoDB engine is currently supported for MySQL version 5.6 COMPACT , REDUNDANT , DYNAMIC , COMPRESSED Four formats, the default is COMPACT format, and COMPRESSED is rarely used and not recommended (see article 1 below). If compression features are needed, TokuDB engine can be directly considered.

8. Compared with REDUNDANT, COMPACT row format can save about 20% of storage space, and COMPRESSED can save about 50% of storage space compared with COMPACT, but it will cause TPS to drop by 90%. Therefore, the COMPRESSED line format is strongly discouraged.

9. When the row format is DYNAMIC or COMPRESSED, long columns such as TEXT/BLOB (long column, or other longer columns, but only TEXT/BLOB type, depending on the specific situation) will be completely stored in an independent data page, and only a 20-byte pointer is used in the clustered index page to point to the new page, which is called off-page. Row migration like ORACLE has serious waste of disk space and poor performance of I/O. Therefore, the use of BLOB, TEXT, VARCHAR column types longer than 255 is strongly discouraged.

10. When the file format of InnoDB ( innodb_file_format ) is set to Antelope, and the row format is COMPACT or REDUNDANT, the BLOB, TEXT, or Long VARCHAR columns store only the first 768 bytes of them in the aggregated cable page (the maximum 768 bytes are used to facilitate prefix indexing/prefix index), and the rest is stored in the extra page, even if only one byte is added. Therefore, the shorter the length of all columns, the better.

11. BLOB, TEXT or page with long VARCHAR column stored in off-page are exclusive and cannot be shared. Therefore, it is strongly not recommended to use multiple long columns in one table.

To sum up, if you really need to store BLOB, TEXT and long VARCHAR columns in the InnoDB table in actual business, you have the following suggestions:

1. Serialize and compress all data as much as possible and store them in the same column to avoid multiple off-page.

2. Columns with actual maximum storage length less than 255 are converted into VARCHAR Or CHAR Type (if it is variable-length data 2, it makes no difference; if it is fixed-length data, CHAR type is used).

3. If you can't integrate all columns into one column, you can settle for the next best thing. After arranging and combining according to the maximum length of each column, you can split it into multiple sub-tables. Try to make the total row length of each sub-table less than 8KB, and reduce the frequency of off-page.

4. The above suggestion is that data page is the default 16KB. If it is modified to 8KB or other sizes, please test according to the above theory to find the most suitable value.

5. When the character column length is less than 255, no matter whether CHAR or VARCHAR is used for storage, or VARCHAR column length is defined as 255, the actual tablespace will not increase.

Summarize


Related articles: