Hibernate's method of batching massive amounts of data

  • 2020-05-07 19:44:44
  • OfStack

This article illustrates an example of Hibernate's approach to batch processing of large amounts of data. Share with you for your reference, as follows:

Hibernate's bulk processing of large amounts is actually very undesirable in terms of performance and wastes a lot of memory. In terms of its mechanism, Hibernate first looks up the qualified data, puts it into memory, and then operates on it. The actual use of the performance is not ideal, in the author's actual use of the following three optimization scheme data is: 100,000 data inserted into the database, it takes about 30 minutes, ha ha, faint. (I insert 1000000 pieces of data in 10 minutes (the field is relatively small))

In summary, there are three ways to solve performance problems:

1: bypass Hibernate API and do it directly through JDBC API. This method is better in performance. And the fastest.

2: using stored procedures.

3: Hibernate API is still used for routine batch processing, which can also be changed, and the change is in, we can find a fixed amount of time, timely delete these data after the operation, session.flush (); session.evict(XX object set); This also saves 1 performance penalty. This "1 definite quantity "has to do with the actual situation of the quantitative reference. 1 general for 30-60 or so, but the effect is still not ideal.

1: bypass Hibernate API and do it directly through JDBC API, which is better in performance and fastest. (the instance is an update operation)


Transaction tx=session.beginTransaction(); // Notice that I'm using hibernate Transaction boundary 
Connection conn=session.connection();
PreparedStatement stmt=conn.preparedStatement("update CUSTOMER as C set C.sarlary=c.sarlary+1 where c.sarlary>1000");
stmt.excuteUpdate();
tx.commit(); // Notice that I'm using hibernate Transaction boundary 

In this small program, API is directly called JDBC to access the database, which is very efficient. Avoid performance problems caused by Hibernate being queried and loaded into memory before being operated on
.
2: using stored procedures. However, this approach is not recommended for ease of planting and ease of application deployment. (the instance is an update operation)

If the underlying database, such as Oracle, supports stored procedures, you can also perform batch updates through stored procedures. Stored procedures run directly in the database, which is faster. A stored procedure named batchUpdateCustomer() can be defined in Oracle database. The code is as follows:

create or replace procedure batchUpdateCustomer(p_age in number) as begin update CUSTOMERS set AGE=AGE+1 where AGE>p_age;end;

The above stored procedure has one parameter p_age, which represents the age of the customer. The application can invoke the stored procedure as follows:


tx = session.beginTransaction();
Connection con=session.connection();
String procedure = "{call batchUpdateCustomer( ? ) }";
CallableStatement cstmt = con.prepareCall(procedure);
cstmt.setInt(1 . 0); // Set the age parameter to 0
cstmt.executeUpdate();
tx.commit();

As you can see from the above program, the application must also bypass Hibernate API and invoke the stored procedure directly through JDBC API.

3: Hibernate API is still used for routine batch processing, which can also change, and the change is in, we can find a fixed amount of time, timely delete these data after the operation, session.flush (); session.evict(XX object set); This also saves 1 performance penalty. This "one definite quantity" needs to be a quantitative reference according to the actual situation...
(the instance is a save operation)

The business logic is: we want to insert 10 000 pieces of data to the database


tx=session.beginTransaction();
for(int i=0;i<100000;i++)
{
Customer custom=new Customer();
custom.setName("user"+i);
session.save(custom);
if(i%50==0) //  per 50 Data as 1 The processing units, which is what I said above "1 Set the amount of " This amount is to be taken into account 
{
session.flush();
session.clear();
}
}

This keeps the system in a stable range...

In the process of project development, due to project requirements, we often need to insert large quantities of data into the database. There are orders of magnitude of ten thousand, one hundred thousand, one million, even ten million. When data of this order of magnitude is inserted with Hibernate, an exception may occur. The most common exception is OutOfMemoryError(out-of-memory exception).

First, let's briefly review the mechanism of the Hibernate insert operation. Hibernate maintains its internal cache, and when we do an insert, we put all the objects we want to manipulate into our internal cache for management.

Speaking of the Hibernate cache, Hibernate has an internal cache and a level 2 cache. Since Hibernate has different management mechanisms for these two caches, we can configure the size of level 2 cache, while Hibernate has a "laissez-faire" attitude towards internal cache, with no limits on its capacity. Now the problem is found, when we do mass data insertion, so many objects will be generated into the internal cache (internal cache is done in memory cache), so your system memory will be bit by bit eaten, if the system is eventually "burst", it is not surprising.

Let's think about how we can better deal with this problem. Some development conditions must be handled using Hibernate, while others are more flexible and can be approached in other ways.

I recommend two methods here:

(1) : optimize Hibernate, and adopt the method of segway insertion in the program to clear the cache in time.
(2) : bypass Hibernate API and directly do batch insertion through JDBC API, which is the best and fastest method in performance.

For method 1 above, the basic idea is: optimize Hibernate, set hibernate.jdbc.batch_size parameter in the configuration file, to specify the amount of SQL to be submitted each time; The program adopts the method of piecewise insertion to clear the cache in time (Session implements asynchronous write-behind, which allows Hibernate to explicitly write batch operations), which means that every time a quantitative amount of data is inserted, it is cleared from the internal cache in time, freeing up the memory used.

To set the hibernate.jdbc.batch_size parameter, refer to the following configuration.


<hibernate-configuration> <session-factory> ... 
<property name=" hibernate.jdbc.batch_size">50</property> ... 
<session-factory> <hibernate-configuration>

The reason for configuring the hibernate.jdbc.batch_size parameter is to read the database as little as possible. As you can see from the above configuration, Hibernate waits until the program has accumulated 50 SQL before committing in bulk.

I am also thinking that the parameter value of hibernate.jdbc.batch_size may not be set as large as possible, which is questionable from a performance perspective. This should consider the actual situation, the appropriate setting, 1 general situation set 30, 50 can meet the needs.

Program implementation, the author to insert 10,000 pieces of data as an example, such as


Session session=HibernateUtil.currentSession();
Transatcion tx=session.beginTransaction();
for(int i=0;i<10000;i++)
{
Student st=new Student();
st.setName("feifei");
session.save(st);
if(i%50==0) // per 50 Data as 1 Processing unit 
{
session.flush(); // Keep in sync with database data 
session.clear(); // Clear the internal cache of all data, timely release the occupied memory 
}
}
tx.commit();
 ... 

At a fixed data size, this approach can maintain the system memory resources in a relatively stable range.

Note: the above mentioned level 2 cache, the author here is necessary to mention 1. If level 2 cache is enabled, Hibernate will, in order to maintain level 2 cache, charge the corresponding data into the level 2 cache when we do insert, update and delete operations. There is a big performance penalty, so I recommend disabling level 2 caching in batch situations.

For method 2, the traditional JDBC batch is used and JDBC API is used.

Refer to java batch self-executing SQL for some methods

Looking at the code above, do you always feel something is wrong? Yeah, no! This is the traditional programming of JDBC, without 1 Hibernate flavor.

The above code can be modified as follows:


Transaction tx=session.beginTransaction(); // use Hibernate Transaction processing 
Connection conn=session.connection();
PrepareStatement stmt=conn.prepareStatement("insert into T_STUDENT(name) values( ? )");
for(int j=0;j++;j<200){
for(int i=0;i++;j<50)
{
stmt.setString(1 . "feifei");
}
}
stmt.executeUpdate();
tx.commit(); // use  Hibernate Transaction boundary 
 ... 

This is a very Hibernate touch. After testing, the author found that JDBC API was used for batch processing, and the performance was nearly 10 times higher than that of Hibernate API, and JDBC was superior in performance, which is no doubt.

Batch update and delete in Hibernate2, for batch update operation, Hibernate is to find out the data that meets the requirements, and then do update operation. Batch delete is also the case, the first to meet the conditions of the data out, and then do delete operations.

This has two big disadvantages:

(1) : occupies a large amount of memory.
(2) : when processing massive data, the execution of update/delete statement is massive, and an update/delete statement can only operate 1 object, such frequent operation of the database, the performance should be low is conceivable.

After the release of Hibernate3, bulk update/delete was introduced for batch update/delete operations. The principle of bulk update/delete is to complete the batch update/delete operations through one HQL statement, which is very similar to the batch update/delete operations of JDBC. In terms of performance, there is a big improvement over the batch update/delete of Hibernate2.


Transaction tx=session.beginSession();
String HQL="delete STUDENT";
Query query=session.createQuery(HQL);
int size=query.executeUpdate();
tx.commit();
 ... 

The console outputs only one delete statement Hibernate: delete from T_STUDENT. The statement execution is less and the performance is similar to that of JDBC, which is a good way to improve performance. Of course, in order to have better performance, the author recommends batch update and delete operation or use JDBC, methods and basic knowledge and the above batch insert method 2 is basically the same, here is not redundant.

Here is another method to consider the performance improvement from the database side, and to call the stored procedure from the Hibernate program side. Stored procedures run faster on the database side. Taking batch update as an example, the reference code is given.

First, create a stored procedure named batchUpdateStudent on the database side:


create or replace produre batchUpdateStudent(a in number) as
begin
update STUDENT set AGE=AGE+1 where AGE>a;
end;

The calling code is as follows:


Transaction tx=session.beginSession();
Connection conn=session.connection();
String pd=" ... {call batchUpdateStudent( ? )}";
CallableStatement cstmt=conn.PrepareCall(pd);
cstmt.setInt(1 . 20); // Let's set the parameter of age to 20
tx.commit();

Observe that the code above also bypasses Hibernate API and invokes the stored procedure using JDBC API, again using the transaction boundary of Hibernate. Stored procedures are undoubtedly a good way to improve the performance of batch processing, running directly with the database side, in a way transferring the pressure of batch processing to the database.

This article explores the batch operations of Hibernate, all with a view to improving performance, and provides only one small aspect of performance improvement.

No matter what kind of method is adopted to improve the performance must be considered according to the actual situation, to provide users with a satisfying and efficient and stable system is the focus.

I hope this article is helpful for you to design Hibernate program.


Related articles: