MySQL data type varchar in detail

  • 2020-06-12 10:48:24
  • OfStack

1. Logical meaning of varchar(N
Starting with MySQL4.1, N in varchar (N) refers to the maximum number of characters (characters) that can be stored in this field, not the number of bytes.
No matter a Chinese English character or number, or a Chinese character, it is treated as a character. Prior to 4.1, N represented the maximum number of bytes stored (bytes).
2, varchar(N) how long data can be stored
On mysql reference manual, varchar can store up to 65,535 bytes of data. The maximum length of varchar is limited by the maximum length (max row size, 65535bytes). 65535 is not a very precise upper limit. We can continue to reduce this upper limit. The 65535 bytes include the length of all fields, the length identification of variable-length fields (each variable-length field USES an additional 1 or 2 bytes to record the actual data length), and the accumulation of NULL identifying bits.
The NULL identity bit, if the default null allowed column is empty in the varchar field definition, then 1bit is required to identify, and every 8 bits identity forms one field. If there are N varchar fields in a table, you need (N+7) /8 (rounded) bytes to store all NULL identifier bits.
If the data table has only one varchar field and that field DEFAULT NULL, then the maximum length of the varchar field is 65532 bytes, i.e. 65535-2-1=65532 bytes.
CREATE TABLE `vchar1` (   `name` VARCHAR(65533)  NOT  NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `vchar2` (   `name` VARCHAR(65533)  NOT  NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

If the data table has only one varchar field and that field NOT NULL, then the maximum length of the varchar field is 65533 bytes, i.e. 65535-2=65533bytes.
CREATE TABLE `vchar3` (   `name` VARCHAR(65532)  DEFAULT  NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `vchar4` (   `name` VARCHAR(65532)  DEFAULT  NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Let's do a slightly more complicated table structure, minus >
CREATE TABLE `tv` (
`a` VARCHAR(100) DEFAULT NULL,
`b` VARCHAR(100) DEFAULT NULL,
`c` VARCHAR(100) DEFAULT NULL,
`d` VARCHAR(100) DEFAULT NULL,
`e` VARCHAR(100) DEFAULT NULL,
`f` VARCHAR(100) DEFAULT NULL,
`g` VARCHAR(100) DEFAULT NULL,
`h` VARCHAR(100) DEFAULT NULL,
`i` VARCHAR(N) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

What is the maximum value of N in 'i' varchar(N)?
The calculation is as follows: the determined field length is known to be 100*8 bytes, and the total length of the eight varchar (100) fields needs to be variable to indicate that the byte is 1*8=8 bytes. Each NULL field is identified by 1bit and the nine fields are default null, so (9+7)/8bit = 2 bytes needs to store the NULL identity bit. 65535-100 * * 8 8-1-2 = 64725 > 256, then the maximum length of field i is 64725-2 =64723 bytes, namely N=64723.
How many characters can varchar store? This is related to the character set used, latin1, gbk, and utf8 encodings require 1, 2, and 3 bytes to store 1 character, respectively.
3. varchar physical storage
On physical storage, varchar USES one or two extra bytes to represent the actual stored string length (bytes). If the maximum length of the column is less than 256 bytes, it is represented (identified) by 1 byte. If the maximum length is 256 or greater, use two bytes.
When the selected character set is latin1, 1 character takes up 1 byte
varchar(255) stores 1 character, 1 USES 2 bytes physical Spaces to store actual data length and data value.
varchar(256) stores 1 character, USES 2 bytes to represent the actual data length, 1 requires 3 bytes physical storage Spaces.
varchar for different RDBMS engines, there is no physical storage, although there is a logical sense of unity 1. For the different storage engines of mysql, the implementation method and the physical storage of data are also different.
4. varchar in InnoDB
The physical storage mode of varchar in InnoDB is related to innodb_file_format used by InnoDB. Early innodb_file_forma used the Antelope file format, which supported both redundant and compact row_format. Starting from 5.5 or InnoDB1.1, a new file format, Barracuda can be used. Barracuda is compatible with Redundant and also supports dynamic and compressed row_format.
When innodb_file_format=Antelope, ROW_FORMAT=REDUNDANT or COMPACT.
The clustered index of innodb (cluster index) stores only the first 768 bytes of the varchar, text, blob fields, and the remaining bytes are stored in a single overflow page, also known as ES149en-ES150en. The 768 byte prefix is followed by a 20-byte pointer to the location of overflow pages.
In addition, in the case of innodb_file_format=Antelope, up to 10 large fields can be stored in InnoDB (off-ES160en storage is required). The default of innodbd page size is 16KB, the length of InnoDB single line cannot exceed 16k/2=8k bytes, (768+20) *10 < 8 k.
When innodb_file_format=Barracuda, ROW_FORMAT=DYNAMIC or COMPRESSED
Whether all varchar, text, and blob fields in innodb are fully es184EN-page stored depends on the length of the field and the total length of the entire row. For columns stored in ES186en-ES187en, only a 20-byte pointer is stored in cluster index, pointing to the actual overflow page storage location. If the length of a single row is too large to fit cluster index page, innodb will select the longest column for off-ES197en storage until the length of the row fits cluster index page.
5. varchar in MyISAM
For the MyISAM engine, all data in the varchar field is stored in the data row (in-ES208en). The row_format of the myisam table also affects the physical storage behavior of varchar.
row_format of MyISAM can be set to fixed and dynamic via create or alter sql statements. It is also possible to generate the storage format of row_format=compresse from myisampack.
When there is no field of type text or blob in the myisam table, you can set row_format to fixed(or dynamic), otherwise only dynamic.
When the varchar field is present in the table, row_format can be set to fixed or dynamic. row_format=fixed is used to store varchar field data, which wastes storage space. varchar will store at fixed length. row_format is fixed and dynamic, and the physical implementation of varchar is also different (see the source code files field.h and field.cc), so the physical storage of the varchar field will also change when the conversion occurs between fixed and dynamic.

Related articles: