Efficiency comparison of several methods of MyBatis batch insertion

  • 2021-11-14 05:32:04
  • OfStack

Preface to the catalogue 1. Preliminary preparation
1.1 Table Structure 1.2 Project Configuration File
1.3 Entity Class
2. Repeatedly execute a single insert statement 2.1 corresponding to mapper interface
2.2 Test Methods 3. foreach Splice SQL
3.1 Corresponding to mapper interface
3.2 Test Methodology 3.3 Execution Times4. Batch 4.1 rewriteBatchedStatements Parameter4.2 Batch Prepared4.3 Execution Times4.4 If the data is larger
STEP 5 Summarize

Preface

There are three main ways to batch data:

Execute a single insert statement repeatedly foreach Splice sql Batch processing

1. Prepare

Based on Spring Boot + Mysql, and in order to omit get/set, lombok is used, see pom. xml for details.

1.1 Table structure

id uses database self-increment.


DROP TABLE IF EXISTS `user_info_batch`;
CREATE TABLE `user_info_batch` (
                           `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT ' Primary key id',
                           `user_name` varchar(100) NOT NULL COMMENT ' Account name ',
                           `pass_word` varchar(100) NOT NULL COMMENT ' Login password ',
                           `nick_name` varchar(30) NOT NULL COMMENT ' Nickname ',
                           `mobile` varchar(30) NOT NULL COMMENT ' Mobile phone number ',
                           `email` varchar(100) DEFAULT NULL COMMENT ' Email address ',
                           `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ' Creation time ',
                           `gmt_update` timestamp NULL DEFAULT NULL COMMENT ' Update time ',
                           PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';

1.2 Project Profile

Careful you may have found that the database url is followed by a paragraph rewriteBatchedStatements=true. What's the use? Don't worry, it will be introduced later.


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity

1.3 Entity Class


@Data
@Accessors(chain = true)
public class UserInfoBatchDO implements Serializable {
    private Long id;

    private String userName;

    private String passWord;

    private String nickName;

    private String mobile;

    private String email;

    private LocalDateTime gmtCreate;

    private LocalDateTime gmtUpdate;
}

1.4 UserInfoBatchMapper


public interface UserInfoBatchMapper {

    /**  Single insert 
     * @param info
     * @return
     */
    int insert(UserInfoBatchDO info);

    /**
     * foreach  Insert 
     * @param list
     * @return
     */
    int batchInsert(List<UserInfoBatchDO> list);
}

1.5 UserInfoBatchMapper.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="cn.van.mybatis.batch.mapper.UserInfoBatchMapper">

  <insert id="insert" parameterType="cn.van.mybatis.batch.entity.UserInfoBatchDO">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP})
  </insert>

  <insert id="batchInsert">
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values
    <foreach collection="list" item="item" separator=",">
      (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})
    </foreach>
  </insert>
</mapper>

1.6 Preliminary data

For the convenience of testing, several variables are pulled away and loaded in advance.


    private List<UserInfoBatchDO> list = new ArrayList<>();
    private List<UserInfoBatchDO> lessList = new ArrayList<>();
    private List<UserInfoBatchDO> lageList = new ArrayList<>();
    private List<UserInfoBatchDO> warmList = new ArrayList<>();
    //  Counting tool 
    private StopWatch sw = new StopWatch();

In order to assemble data conveniently, a public method is extracted.


    private List<UserInfoBatchDO> assemblyData(int count){
        List<UserInfoBatchDO> list = new ArrayList<>();
        UserInfoBatchDO userInfoDO;
        for (int i = 0;i < count;i++){
            userInfoDO = new UserInfoBatchDO()
                    .setUserName("Van")
                    .setNickName(" Dust blog ")
                    .setMobile("17098705205")
                    .setPassWord("password")
                    .setGmtUpdate(LocalDateTime.now());
            list.add(userInfoDO);
        }
        return list;
    }

Preheating data


    @Before
    public void assemblyData() {
        list = assemblyData(200000);
        lessList = assemblyData(2000);
        lageList = assemblyData(1000000);
        warmList = assemblyData(5);
    }

2. Execute a single insert statement repeatedly

Maybe'lazy 'programmers will do this. It is very simple to nest an for loop directly on the original single insert statement.

2.1 Corresponding to mapper interface


int insert(UserInfoBatchDO info);

2.2 Test Methods

Because this method is too slow, the data is reduced to 2000


@Test
public void insert() {
    log.info(" [Program warm-up] ");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info(" [Warm-up is over] ");
    sw.start(" Execute a single insert statement repeatedly ");
    //  Insert here  20w  The bar was too slow, so I only inserted it  2000  Article 
    for (UserInfoBatchDO userInfoBatchDO : lessList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

2.3 Execution time

1st time


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
0

Second time


-----------------------------------------
ms     %     Task name
-----------------------------------------
64853  100%   Execute a single insert statement repeatedly 

The 3rd time


-----------------------------------------
ms     %     Task name
-----------------------------------------
58235  100%   Execute a single insert statement repeatedly 

This method inserts 2000 pieces of data, and the average time of executing 3 times is 60991 ms.

3. foreach splicing SQL

3.1 Corresponding to mapper Interface


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
3

3.2 Test Methods

This method and the next one are tested with 20w data.


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
4

3.3 Execution Time

1st time


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
5

Second time


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
6

The 3rd time


-----------------------------------------
ms     %     Task name
-----------------------------------------
19827  100%  foreach  Splice  sql

This method inserts 20w pieces of data, and the average time of executing 3 times is 18852 ms.

4. Batch processing

In this way, mapper and xml reuse 2.1.

4.1 rewriteBatchedStatements Parameters

At the beginning of Test 1, I found that the submission method changed to Mybatis Batch didn't work. In fact, when inserting, it was still inserting one record, and the speed was far lower than the original method of splicing SQL, which was very unscientific.

Later, it was discovered that for batch execution, a new parameter needs to be added to the concatenation URL string: rewriteBatchedStatements=true

Introduction of rewriteBatchedStatements Parameters

rewriteBatchedStatements parameters should be added to JDBC of MySql and url connected, and drivers of version 5.1. 13 or above should be guaranteed to realize high-performance batch insertion. By default, MySql JDBC driver will ignore executeBatch () statement, break up a group of sql statements that we expect to execute in batches, and send them to MySql database one by one. Batch insertion is actually a single insertion, which directly causes low performance. Only when the rewriteBatchedStatements parameter is set to true, the driver will help you execute SQL in batches. This option works for both INSERT/UPDATE/DELETE.

4.2 Batch Preparation

Manual injection of SqlSessionFactory


    @Resource
    private SqlSessionFactory sqlSessionFactory;

Test code


#  Database configuration 
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity
9

4.3 Execution Time

1st time


-----------------------------------------
ms     %     Task name
-----------------------------------------
09346  100%   Batch execution   Insert 

Second time


-----------------------------------------
ms     %     Task name
-----------------------------------------
08890  100%   Batch execution   Insert 

The 3rd time


-----------------------------------------
ms     %     Task name
-----------------------------------------
09042  100%   Batch execution   Insert 

In this way, 20w pieces of data are inserted, and the average time of executing 3 times is 9092 ms.

4.4 If the data is larger

When I expanded the data to 100w, the way foreach spliced sql could no longer complete the insert, so I had to test the batch insert time.

When testing, you only need to cut list in the "4.2" test code into lageList test.

1st time


-----------------------------------------
ms     %     Task name
-----------------------------------------
32419  100%   Batch execution   Insert 

Second time


-----------------------------------------
ms     %     Task name
-----------------------------------------
31935  100%   Batch execution   Insert 

The 3rd time


-----------------------------------------
ms     %     Task name
-----------------------------------------
33048  100%   Batch execution   Insert 

This method inserts 100w pieces of data, and the average time of executing 3 times is 32467 ms.

STEP 5 Summarize

批量插入方式 数据量 执行3次的平均时间
循环插入单条数据 2000 60991 ms
foreach 拼接sql 20w 18852 ms
批处理 20w 9092 ms
批处理 100w 32467 ms

Although the efficiency of inserting a single piece of data circularly is extremely low, the amount of code is very small, and it can be used when the amount of data is small, but it is forbidden to use when the amount of data is large, which is too low in efficiency; The way foreach splices sql has a large section of xml and sql statements to write when using, which is easy to make mistakes. Although the efficiency is acceptable, it still cannot be used when dealing with a large amount of data, so it is not recommended; Batch execution is recommended when there is a large amount of data inserted, and it is also convenient to use.

"This article sample code"


Related articles: