Introduction to Mysql varchar size and length problem

  • 2020-05-12 06:19:38
  • OfStack

Below version 4.0, varchar(20) refers to 20 bytes. If you store UTF8 Chinese characters, you can only store 6 (3 bytes for each Chinese character).
Above version 5.0, varchar(20) refers to 20 characters, whether stored in Numbers, letters or UTF8 Chinese characters (each Chinese character is 3 bytes), can be stored in 20, the maximum size is 65,532 bytes
The maximum size of Mysql4 is only 20 bytes, but the storage size of Mysql5 varies depending on the encoding.

1. Restrictive rules

The restrictions on fields are defined with the following rules:

a) storage limits

The varchar field stores the actual content separately from the clustered index, and the content begins with 1 to 2 bytes representing the actual length (2 bytes if the length exceeds 255), so the maximum length cannot exceed 65535.

b) encoding length limitation

If the character type is gbk, each character takes up to 2 bytes, and the maximum length cannot exceed 32766.

If the character type is utf8, each character takes up to 3 bytes and the maximum length cannot exceed 21845.

If the above limitation is exceeded, the varchar field will be forced to be of type text and warning will be generated.

c) length limit

What causes the varchar length limitation in practice is the length defined by one line. MySQL requires that the definition length of a row be no more than 65,535. If the defined table is longer than this value, it is prompted

ERROR 1118 (42000) : Row size too large. The maximum for the used type, not counting BLOBs, is es65535. You have to some TEXT TEXT or BLOBs

2. Calculation examples

Take two examples to illustrate the calculation of actual length 1.

a) if 1 table has only 1 varchar type, as defined by

create table t4(c varchar(N)) charset=gbk;

The maximum value of N is (65535-1-2)/2= 32766.

The reason for minus 1 is that the actual row storage starts at the second byte ';

The reason for the minus 2 is that the 2 bytes in the head of varchar represent the length;

The reason for dividing by 2 is that the character encoding is gbk.

b) if 1 table is defined as

create table t4(c int, c2 char(30), c3 varchar(N)) charset=utf8;

The maximum value of N here is (65535-1-2-4-30*3)/3=21812

Minus 1 and minus 2 are the same;

The reason for minus 4 is that int of type c takes 4 bytes;

The reason for minus 30*3 is that char(30) takes 90 bytes and the encoding is utf8.

If it is strongly converted to text by varchar, which exceeds the b rule above, then each field occupies a defined length of 11 bytes, which is no longer "varchar".

Related articles: