Implementation process of java multi thread batch splitting List into database

  • 2021-12-05 06:03:13
  • OfStack

Catalogue 1. Preface 2. Put list directly into Mysql 3. Grouping list into Mysql 4. Multithreaded batch import Mysql 5. Summary

1. Preface

Two days ago, I made an import function, which was very slow at the beginning. It took more than one minute to import 2w data. Later, it was optimized at 1:00, from directly importing list into Mysql, to assigning list into Mysql, and to multithreading list into Mysql. The time is 1:00. 1:00. It's getting less. Very cool, and finally became within 10s. Let's show the following process.

2. Put list directly into Mysql

Bulk import operations using mybatis:


  @Transactional(rollbackFor = Exception.class)
    public int addFreshStudentsNew2(List<FreshStudentAndStudentModel> list, String schoolNo) {
        if (list == null || list.isEmpty()) {
            return 0;
        }
        List<StudentEntity> studentEntityList = new LinkedList<>();
        List<EnrollStudentEntity> enrollStudentEntityList = new LinkedList<>();
        List<AllusersEntity> allusersEntityList = new LinkedList<>();

        for (FreshStudentAndStudentModel freshStudentAndStudentModel : list) {

            EnrollStudentEntity enrollStudentEntity = new EnrollStudentEntity();
            StudentEntity studentEntity = new StudentEntity();
            BeanUtils.copyProperties(freshStudentAndStudentModel, studentEntity);
            BeanUtils.copyProperties(freshStudentAndStudentModel, enrollStudentEntity);
            String operator = TenancyContext.UserID.get();
            String studentId = BaseUuidUtils.base58Uuid();
            enrollStudentEntity.setId(BaseUuidUtils.base58Uuid());
            enrollStudentEntity.setStudentId(studentId);
            enrollStudentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            enrollStudentEntity.setOperator(operator);
            studentEntity.setId(studentId);
            studentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            studentEntity.setOperator(operator);
            studentEntityList.add(studentEntity);
            enrollStudentEntityList.add(enrollStudentEntity);

            AllusersEntity allusersEntity = new AllusersEntity();
            allusersEntity.setId(enrollStudentEntity.getId());
            allusersEntity.setUserCode(enrollStudentEntity.getNemtCode());
            allusersEntity.setUserName(enrollStudentEntity.getName());
            allusersEntity.setSchoolNo(schoolNo);
            allusersEntity.setTelNum(enrollStudentEntity.getTelNum());
            allusersEntity.setPassword(enrollStudentEntity.getNemtCode());  // The password is set to examinee number 
            allusersEntityList.add(allusersEntity);
        }
            enResult = enrollStudentDao.insertAll(enrollStudentEntityList);
            stuResult = studentDao.insertAll(studentEntityList);
            allResult = allusersFacade.insertUserList(allusersEntityList);

        if (enResult > 0 && stuResult > 0 && allResult) {
            return 10;
        }
        return -10;
    }

Mapper.xml


  <insert id="insertAll" parameterType="com.dmsdbj.itoo.basicInfo.entity.EnrollStudentEntity">
        insert into tb_enroll_student
        <trim prefix="(" suffix=")" suffixOverrides=",">
                id,  
                remark,  
                nEMT_aspiration,  
                nEMT_code,  
                nEMT_score,  
                student_id,  
                identity_card_id,  
                level,  
                major,  
                name,  
                nation,  
                secondary_college,  
                operator,  
                sex,  
                is_delete,  
                account_address,  
                native_place,  
                original_place,  
                used_name,  
                pictrue,  
                join_party_date,  
                political_status,  
                tel_num,  
                is_registry,  
                graduate_school,  
                create_time,  
                update_time        </trim>        
        values
        <foreach collection="list" item="item" index="index" separator=",">
        (
                #{item.id,jdbcType=VARCHAR},
                #{item.remark,jdbcType=VARCHAR},
                #{item.nemtAspiration,jdbcType=VARCHAR},
                #{item.nemtCode,jdbcType=VARCHAR},
                #{item.nemtScore,jdbcType=VARCHAR},
                #{item.studentId,jdbcType=VARCHAR},
                #{item.identityCardId,jdbcType=VARCHAR},
                #{item.level,jdbcType=VARCHAR},
                #{item.major,jdbcType=VARCHAR},
                #{item.name,jdbcType=VARCHAR},
                #{item.nation,jdbcType=VARCHAR},
                #{item.secondaryCollege,jdbcType=VARCHAR},
                #{item.operator,jdbcType=VARCHAR},
                #{item.sex,jdbcType=VARCHAR},
                0,
                #{item.accountAddress,jdbcType=VARCHAR},
                #{item.nativePlace,jdbcType=VARCHAR},
                #{item.originalPlace,jdbcType=VARCHAR},
                #{item.usedName,jdbcType=VARCHAR},
                #{item.pictrue,jdbcType=VARCHAR},
                #{item.joinPartyDate,jdbcType=VARCHAR},
                #{item.politicalStatus,jdbcType=VARCHAR},
                #{item.telNum,jdbcType=VARCHAR},
                #{item.isRegistry,jdbcType=TINYINT},
                #{item.graduateSchool,jdbcType=VARCHAR},
                now(),
                now()        
        )   
        </foreach>                
  </insert> 

Code description:

The underlying mapper is generated by reverse engineering, and the batch insertion is as follows, which is spliced to be similar: insert into tb_enroll_student () values (), () …. ();

The disadvantage of this is that Database 1 generally has a default setting, that is, the data of each sql operation cannot exceed 4M. In this way, when there is a lot of data, the database will report an error Packet for query is too large (6071393 > 4194304). You can change this value on the server by setting the max_allowed_packet 'variable., although we can pass

Similar modifications my. ini plus max_allowed_packet = 67108864

67108864=64M

The default size is 4194304, which is 4M

Restart the mysql service after the modification is completed. If you modify it through the command line, you don't need to restart the mysql service.

Complete this operation, but we can't guarantee the maximum size of the project at a time, which is drawback. So you can consider grouping imports.

3. Grouping list into Mysql

The same applies to mybatis batch insertion, except that each import is grouped and calculated, and then imported several times:


 @Transactional(rollbackFor = Exception.class)
    public int addFreshStudentsNew2(List<FreshStudentAndStudentModel> list, String schoolNo) {
        if (list == null || list.isEmpty()) {
            return 0;
        }
        List<StudentEntity> studentEntityList = new LinkedList<>();
        List<EnrollStudentEntity> enrollStudentEntityList = new LinkedList<>();
        List<AllusersEntity> allusersEntityList = new LinkedList<>();

        for (FreshStudentAndStudentModel freshStudentAndStudentModel : list) {

            EnrollStudentEntity enrollStudentEntity = new EnrollStudentEntity();
            StudentEntity studentEntity = new StudentEntity();
            BeanUtils.copyProperties(freshStudentAndStudentModel, studentEntity);
            BeanUtils.copyProperties(freshStudentAndStudentModel, enrollStudentEntity);
            String operator = TenancyContext.UserID.get();
            String studentId = BaseUuidUtils.base58Uuid();
            enrollStudentEntity.setId(BaseUuidUtils.base58Uuid());
            enrollStudentEntity.setStudentId(studentId);
            enrollStudentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            enrollStudentEntity.setOperator(operator);
            studentEntity.setId(studentId);
            studentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            studentEntity.setOperator(operator);
            studentEntityList.add(studentEntity);
            enrollStudentEntityList.add(enrollStudentEntity);

            AllusersEntity allusersEntity = new AllusersEntity();
            allusersEntity.setId(enrollStudentEntity.getId());
            allusersEntity.setUserCode(enrollStudentEntity.getNemtCode());
            allusersEntity.setUserName(enrollStudentEntity.getName());
            allusersEntity.setSchoolNo(schoolNo);
            allusersEntity.setTelNum(enrollStudentEntity.getTelNum());
            allusersEntity.setPassword(enrollStudentEntity.getNemtCode());  // The password is set to examinee number 
            allusersEntityList.add(allusersEntity);
        }

        int c = 100;
        int b = enrollStudentEntityList.size() / c;
        int d = enrollStudentEntityList.size() % c;

        int enResult = 0;
        int stuResult = 0;
        boolean allResult = false;

        for (int e = c; e <= c * b; e = e + c) {
            enResult = enrollStudentDao.insertAll(enrollStudentEntityList.subList(e - c, e));
            stuResult = studentDao.insertAll(studentEntityList.subList(e - c, e));
            allResult = allusersFacade.insertUserList(allusersEntityList.subList(e - c, e));
        }
        if (d != 0) {
            enResult = enrollStudentDao.insertAll(enrollStudentEntityList.subList(c * b, enrollStudentEntityList.size()));
            stuResult = studentDao.insertAll(studentEntityList.subList(c * b, studentEntityList.size()));
            allResult = allusersFacade.insertUserList(allusersEntityList.subList(c * b, allusersEntityList.size()));
        }

        if (enResult > 0 && stuResult > 0 && allResult) {
            return 10;
        }
        return -10;
    }

Code description:

This operation can avoid the above errors, but multiple insertions virtually increase the operation practice, and it is easy to time out. Therefore, this method is not worth advocating.

Improved again, using multithreaded batch import.

4. Multithreaded batch import Mysql

Still use mybatis batch import, the difference is, according to the number of threads grouping, and then set up a multi-thread pool, import.


  @Transactional(rollbackFor = Exception.class)
    public int addFreshStudentsNew(List<FreshStudentAndStudentModel> list, String schoolNo) {
        if (list == null || list.isEmpty()) {
            return 0;
        }
        List<StudentEntity> studentEntityList = new LinkedList<>();
        List<EnrollStudentEntity> enrollStudentEntityList = new LinkedList<>();
        List<AllusersEntity> allusersEntityList = new LinkedList<>();

        list.forEach(freshStudentAndStudentModel -> {
            EnrollStudentEntity enrollStudentEntity = new EnrollStudentEntity();
            StudentEntity studentEntity = new StudentEntity();
            BeanUtils.copyProperties(freshStudentAndStudentModel, studentEntity);
            BeanUtils.copyProperties(freshStudentAndStudentModel, enrollStudentEntity);
            String operator = TenancyContext.UserID.get();
            String studentId = BaseUuidUtils.base58Uuid();
            enrollStudentEntity.setId(BaseUuidUtils.base58Uuid());
            enrollStudentEntity.setStudentId(studentId);
            enrollStudentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            enrollStudentEntity.setOperator(operator);
            studentEntity.setId(studentId);
            studentEntity.setIdentityCardId(freshStudentAndStudentModel.getIdCard());
            studentEntity.setOperator(operator);
            studentEntityList.add(studentEntity);
            enrollStudentEntityList.add(enrollStudentEntity);

            AllusersEntity allusersEntity = new AllusersEntity();
            allusersEntity.setId(enrollStudentEntity.getId());
            allusersEntity.setUserCode(enrollStudentEntity.getNemtCode());
            allusersEntity.setUserName(enrollStudentEntity.getName());
            allusersEntity.setSchoolNo(schoolNo);
            allusersEntity.setTelNum(enrollStudentEntity.getTelNum());
            allusersEntity.setPassword(enrollStudentEntity.getNemtCode());  // The password is set to examinee number 
            allusersEntityList.add(allusersEntity);
        });


        int nThreads = 50;

        int size = enrollStudentEntityList.size();
        ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
        List<Future<Integer>> futures = new ArrayList<Future<Integer>>(nThreads);

        for (int i = 0; i < nThreads; i++) {
            final List<EnrollStudentEntity> EnrollStudentEntityImputList = enrollStudentEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));
            final List<StudentEntity> studentEntityImportList = studentEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));
            final List<AllusersEntity> allusersEntityImportList = allusersEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));

           Callable<Integer> task1 = () -> {
          studentSave.saveStudent(EnrollStudentEntityImputList,studentEntityImportList,allusersEntityImportList);
               return 1;
            };
          futures.add(executorService.submit(task1));
        }
        executorService.shutdown();
        if (!futures.isEmpty() && futures != null) {
            return 10;
        }
        return -10;
    }

Code description:

The above is through the application of ExecutorService to establish a fixed number of threads, and then according to the number of threads grouping, batch import in turn. 1 aspect can relieve the pressure of the database, another one has more threads, and 1 certain degree will improve the running time of the program. The disadvantage is that it depends on the configuration of the server. If the configuration is good, you can open multipoint threads, and if the configuration is poor, you can open small points.

5. Summary

By using this operation, it is constantly improved, and the project use skills are also good. Refueling ~ ~ multithreading ~ ~


Related articles: