Summary and explanation of MySQL sub library and sub table

2021-11-24 03:08:49
OfStack
 
During the project development, our database data is getting bigger and bigger, and with it, there are too many data in a single table. As a result, the query is slow, and the application operation is seriously affected due to the locking mechanism of the table, which leads to the bottleneck of database performance. 
 
When this happens, We can consider dividing libraries into tables, that is, splitting a single database or table into multiple libraries and multiple data tables. Then when users access, they can access different libraries and different tables according to the algorithm and logic determined by 1, so that data is scattered into multiple data tables and the access pressure of a single data table is reduced. Improved database access performance. 
 
The following is a summary of the sub-libraries and sub-tables in the project: 
 
Single warehouse and single table 
 
Single-database single-table is the most common database design. For example, there is one user (user) table in the database db, and all users can be found in the user table in the db library. 
 
Single-database multi-table 
 
With the increase of the number of users, the data volume of user table will become larger and larger. When the data volume reaches a certain level, the query of user table will gradually slow down, thus affecting the performance of the whole DB. If you use mysql, a more serious problem is that when you need to add a column, mysql locks the table, and all read and write operations can only wait. 
 
user can be segmented horizontally in some way to produce two tables with complete structure, such as user_0000 and user_0001. The data of user_0000 + user_0001 + … is just one complete data. 
 
Multi-database and multi-table 
 
With the increase of data volume, perhaps the storage space of a single DB is not enough, and with the increase of query volume, a single database server has no way to support it. At this time, you can distinguish the database horizontally. 
 
Rules for dividing databases and tables 
 
When designing a table, it is necessary to determine what rules this table is divided into libraries and tables. For example, when there is a new user, the program must determine which table to add this user information to; Similarly, when logging in, we have to find the corresponding records in the database through the user's account, all of which need to be carried out according to a certain 1 rule. 
 
Route 
 
The process of finding the corresponding tables and libraries through the rules of dividing libraries and tables. For example, the rule of dividing libraries and tables is user_id mod 4. When the user registers a new account number, 123 of id, we can determine that this account number should be saved in User_0003 by id mod 4. When user 123 logs in, we pass 123 mod 4 and confirm the record in User_0003. 
 
Problems arising from sub-database and sub-table, and matters needing attention 
 
1. The problem of sub-database and sub-table dimension 
 
If the user buys the commodity, need to save the transaction record. If the user's latitude is divided into tables, the transaction record of each user is saved in the same table, so it is very convenient to find the purchase situation of a certain user quickly, but the purchase situation of a certain commodity is likely to be distributed in multiple tables, which is troublesome to find. On the contrary, according to the commodity dimension sub-table, it is very convenient to find the purchase situation of this commodity, but it is troublesome to find the transaction record of the buyer. 
 
So common solutions are: 
 
a. It is solved by scanning the table, which is basically impossible and inefficient. 
 
b. Record two pieces of data, one by user latitude and one by commodity dimension. 
 
c. Through the search engine to solve, but if the real-time requirements are very high, but also related to real-time search. 
 
2. Problems with joint queries 
 
Union queries are almost impossible, because the associated tables may not be in the same 1 database. 
 
3. Avoid cross-library transactions 
 
Avoid modifying the table in db0 in one transaction while modifying the table in db1. One transaction is more complicated to operate and has a certain impact on efficiency. 
 
4. Try to put the same set of data on the same 1DB server 
 
For example, the goods and transaction information of the seller a are put into db0. When db1 is hung up, the related things of the seller a can be used normally. That is to say, avoid the data in the database relying on the data in another database. 
 
1 main multi-standby 
 
In practical application, reading is far greater than writing in most cases. Mysql provides a separation mechanism between reading and writing. All write operations must correspond to Master. Read operations can be carried out on Master and Slave machines. The structures of Slave and Master are completely the same. One Master can have multiple Slave, and even Slave can be hung under Slave. In this way, QPS of DB cluster can be effectively improved. 
 
All write operations are operated on Master first, and then synchronized and updated to Slave, so there is a fixed delay from Master to Slave machine. When the system is busy, the delay problem will become more serious, and the increase in the number of Slave machines will also make this problem more serious. 
 
In addition, it can be seen that Master is the bottleneck of the cluster. When there are too many writes, the stability of Master will be seriously affected. If Master hangs up, the whole cluster will not work normally. 
 
So 
 
1. When the reading pressure is very high, we can consider adding the fraction solution of Slave machine, but when Slave machine reaches 1 fixed number, we have to consider the sub-library.  

2. When the writing pressure is great, it is necessary to divide the library. 
 
Why do you want to divide libraries and tables when using MySQL 
 
It can be said that where MySQL is used, as long as the amount of data is 1, it will encounter 1 problem immediately, and it is necessary to divide the database into tables. 
 
Here quote a question, why should we divide the library into tables? Can't MySQL handle large tables? 
 
In fact, it is a large table that can be handled. In the project I have experienced, the physical file size of a single table is more than 80G, and the number of records in a single table is more than 500 million. Moreover, this table belongs to a very core table: friend relationship table. 
 
However, this method is not the best way. Because the file system such as Ext3 file system has many problems in dealing with larger files. This level can be replaced by xfs file system. However, after MySQL single table is too large, there is a problem that is not easy to solve: The operation related to table structure adjustment is basically impossible. Therefore, the application of sub-database and sub-table will be monitored in the use of major items. 
 
From the Innodb itself, there are only two locks on the Btree of the data file, leaf node lock and child node lock. It can be thought that when pages are split or new leaves are added, data cannot be written in the table. Therefore, it is a better choice to divide the library into tables. 
 
So how much is appropriate for sub-libraries and sub-tables?  
 
After testing in a single table below 10 million records, the write-read performance is relatively good. In this way, if buffer is left, the single table with all data fonts will be kept below 8 million records, and the single table with character types will be kept below 5 million. 
 
If you plan according to 100 libraries and 100 tables, such as user business: 
 
5 million * 100 * 100 = 50 million = 500 billion records. 
 
I have a number in my heart, so it is relatively easy to plan according to my business. 
 
True question 
 
Set the number of website users in tens of millions, but the number of active users is only 1%. How to improve the access speed of active users by optimizing the database? 
 
A: 
 
The partition of MySQL can be used to divide active users into one area and inactive users into another area, so the data amount of active users area is relatively small, so the access speed of active users can be improved. 
 
It can also divide tables horizontally, dividing active users into one table and inactive users into another table, which can improve the access speed of active users. 
 
Summarize