mysql Index Cardinality Concept and Usage Example

  • 2021-11-29 16:44:24
  • OfStack

This paper describes the concept and usage of mysql index cardinality with examples. Share it for your reference, as follows:

Cardinality (index cardinality) is an important concept of mysql index

Index cardinality is the number of different values contained in a data column. For example, if a data column contains values 1, 2, 3, 4, 5, 1, its cardinality is 5. An index works best when its cardinality is high relative to the number of rows in the table (that is, columns contain many different values and there are few duplicate values). If a data column contains many different ages, the index will quickly distinguish the data rows. If a data column is used to record gender (there are only two values, "M" and "F"), then the index is of little use. If the probability of occurrence of values is almost equal, it is possible to get 1.5 rows of data no matter which value is searched. In these cases, it is best not to use indexes at all, because when the query optimizer finds that a value appears in a high percentage of the data rows of the table, it will generally ignore the index and scan the whole table. The customary percentage threshold is "30%"

Another concept is called index selectivity

Index selectivity = index cardinality/total data. The cardinality can be viewed through "show index from table name".
The closer you get to 1, the more likely you are to use the index, and you can understand that 1 is 100%
The advantage of high index selectivity is that mysql can filter more rows when looking for matches, and only 1 index has the best selectivity, with a value of 1.

Here is my own table. The amount of data is not high, just to test the index selectivity


select * from articles
id  Title   Name 
15 Title 0  Big bear 
16 Title 1  Big bear 
17 Title 2  Big bear 
18 Title 3  Big bear 
19 Title 4  Big bear 
20 Title 5  Big bear 
21 Title 6  Big bear 
22 Title 7  Big bear 
23 Title 8 2 Bear 
24 Title 9 2 Bear 

id is the default integer self-increasing primary key

Now add the author column index and view all the indexes of the articles table


ALTER TABLE `articles` ADD INDEX (`author`)
SHOW INDEX FROM articles
table non_unique key_name seq_in_index column_name collation cardinality sub_part packed null index_type
articles 0 PRIMARY 1 id A 10 NULL NULL  BTREE
articles 1 author 1 author A 2 NULL NULL  BTREE

Tell me the meaning of each field

table table name non_unique 0 if the index cannot include repeating words. If you can, it is 1. key_name index name Column sequence number in seq_in_index index, starting with 1 column_name column name How the collation column is stored in the index. In the MySQLSHOW INDEX syntax, there is a value 'A' (ascending order) or NULL (unclassified) cardinality index cardinality sub_part If the column is only partially indexed, the number of characters indexed. NULL if the entire column is indexed. packed indicates how the keyword is compressed. If it is not compressed, it is NULL. null If the column contains NULL, it contains YES. If not, the column contains NO. Index storage method used by index_type (BTREE, FULLTEXT, HASH, RTREE)

As you can see, table articles already has two indexes

id index does not contain duplicate words primary key column name id ascending order index cardinality 10 no partial index no compression does not contain null storage mode is btree

author index contains repeating words index name author column name author ascending order index cardinality 2 no partial index no compression no null storage mode btree

According to the algorithm of index selectivity, the index selectivity of id is 10/10 = 1, and the index selectivity of author is 2/10 = 0.2. Let's test 1 and use explain to see the sentence analysis of 1


explain select * from articles where id = 15
id select_type table partitions type possible_keys key key_len ref rows filtered extra
1 SIMPLE articles NULL const PRIMARY PRIMARY 4 const 1 100.00 NULL

Say the command explain again here

explain shows how mysql uses indexes to process select statements and join tables. It can help you choose a better index and write a more optimized query statement. Say the meaning of each field under 1

1. id SELECT identifier. This is the SELECT query serial number. This is not important, the query sequence number is the order in which the sql statement is executed

2. select_type select type

2.1. When SIMPLE performs a simple select query that does not require Union operation or does not contain subqueries, the select_type responding to the query statement is simple. No matter how complex the query statement is, there must be only one unit query 1 in which select_type is simple in the execution plan
2.2. PRIMARY 1 select query execution plan that requires Union operation or contains subqueries, select_type at the outermost layer is primary. Like simple1, select_type is the unit of primary, and only one select query exists
2.3. union in the unit select query formed by the union of union operations, select_type of all unit select queries after the second is union except the first one. select_type of select, the first unit of union, is not union, but DERIVED. It is a temporary table for storing the query results after union (Union)
2.4. DEPENDENT UNION dependent is like UNION select_type1, and dependent union appears in the collection query formed by union or union all. Here, dependent means that the unit query formed by the combination of union or union and all is externally affected
2.5. union result union result is a data table containing union results

3. table table name

4. type connection type, which has multiple parameters, and the introduction from the best type to the worst type is also the focus of this article

4.1 const, where the table has at most one matching row, const is used to compare primary key or unique indexes. Because it only matches 1 row of data, it can be understood as optimized index and constant lookup very quickly
4.2 eq_ref Explanation of eq_ref, the mysql manual says: "For each combination of rows from the previous table, read 1 row from that table. This is probably the best join type except for the const type."
4.3 ref For each row combination from the previous table, all rows with matching index values are read from this table. Use ref if the join uses only the leftmost prefix of the key, or if the key is not UNIQUE or PRIMARY KEY (in other words, if the join cannot select a single row based on a keyword). This join type is good if the key used matches only a small number of rows
4.4 ref_or_null This join is of the same type as ref, but the addition of MySQL allows you to search specifically for rows that contain NULL values. Optimization of this join type is often used in solving subqueries
4.5 index_merge This join type represents the use of index merge optimization. In this case, the key column contains the list of indexes used, and key_len contains the longest key element of the index used
4.6 unique_subquery
4.7 index_subquery
4.8 range Retrieval within a given range, using 1 index to check rows
4.9 index This join type is the same as ALL except that only the index tree is scanned. This is usually faster than ALL, because index files are usually smaller than data files. (That is, while all and Index both read the full table, index is read from the index and all is read from the hard disk.)
4.10 ALL performs a complete table scan for each row combination from a previous table. If the table is the first one that is not marked with const, this is usually not good, and it is usually poor in its case. You can usually add more indexes instead of using ALL, so that rows can be retrieved based on constant values or column values in the previous table

5. possible_keys does not matter which index will find rows in the table

6. keys indicates the index used by the MYSQL query

7. Index length used by key_len MYSQL

8. ref shows which column or constant to use to select rows from the table from key1

9. rows shows the number of rows that MYSQL executes the query. The larger the value, the worse it is, indicating that the index is not used well

10. Extra This column contains details of the MySQL resolution query

You can see that the id query uses the id index. Simple query is the constant query with the best query type
Then let's try another query, this time using the author index


explain select * from articles where author = " Big bear "
1 SIMPLE articles NULL ALL author NULL NULL NULL 10 80.00 Using where

You can clearly see that the simple query using author index is the worst full table scan
Don't rush to explain, or this sentence, let's change to 2 bears


explain select * from articles where author = "2 Bear "
1 SIMPLE articles NULL ref author author 1022 const 2 100.00 NULL

You can see that the query type this time is ref

That is, mysql believes that full table scanning is faster than using indexes because the number of rows of data for the author is more than 30% of the total data
This is the meaning of the concept of index cardinality and index selectivity, so when building an index, pay attention to building the index on the column with high index cardinality

More readers interested in MySQL can check out the topics on this site: "Summary of MySQL Index Operation Skills", "Summary of MySQL Common Functions", "Encyclopedia of MySQL Log Operation Skills", "Encyclopedia of MySQL Transaction Operation Skills", "Encyclopedia of MySQL Stored Procedure Skills" and "Encyclopedia of MySQL Database Lock Related Skills"

I hope this article is helpful to everyone's MySQL database.


Related articles: