Use and Extension of hash Partition Table of oracle Partition Table

  • 2021-10-25 00:07:47
  • OfStack

Hash partition determines the partition attribution of data by using Hash algorithm on partition keys. What are the advantages of using Hash partitions?

hash partitions also have the advantages of commonly used partitioned tables, such as increasing available rows of data, reducing administrative burden, and improving statement performance. In addition, because the Hash partition table determines its partition according to the hash calculation result of the partition key, and the hash value of a specific partition key is fixed, that is to say, the data of the Hash partition table is aggregated according to the partition key value, and the same partition key must be in the same partition.
For example, in the securities industry, we often inquire about the K line of a certain stock.
Suppose the structure of the table is as follows:


create table equity
(
id number,
trade_date date,
 ... );

The Equity table can be very large, and queries on the equity table typically specify id, querying for a 1 transaction date or other information over a period of time. In this case, how do we need to select partitions for equity table?
From the structure of the table itself alone, it seems that the column trade_date is a good choice for range partitioning. However, if we partition in this way, the query in the previous requirements: specify a certain 1id and query the transaction information within a certain range of 1, such as looking at the K line within one year, then this query often needs to cross partitions. As we know, cross-partition queries on partitioned tables often do not perform very well, especially when such queries are likely to cross many partitions.
You may also say, why don't we just build an index on the columns id and trade_date? Think about it carefully? At this time, the data in equity table is aggregated according to trade_date value, and the data of trade_date value is often in one data block, so even if the query described in the previous requirements is accessed through index, the final table reading is often to read discrete data blocks, that is, every record needs to read one table data block correspondingly.
If the Hash partition table is built, the data is aggregated according to the hash partition key, which is more suitable for the query described in the requirements, because the records of the same id must be in the same partition 1, and the probability that the records of the same id value fall on the same data block also increases, thus reducing IO "to a certain extent".
The above description of hash partition reducing IO is quoted, because it is unrealistic to try to reduce IO operation in a large scale only by Hash partition table, especially when the number of stocks recorded in equity table is very large, it is difficult to physically aggregate the records of the same stock on different trading days into the same data block. In fact, if we use the IOT table organization for the equity table on top of the Hash partition, the query performance described above can be greatly improved. The IOT table is beyond the scope of this article, so it will not be discussed further here.
Before we decide to use the Hash table, we also need to make sure that our selected partition key values are continuously distributed or close to continuous partitions. In addition, the number of partitions needs to be an integer power of 2, such as 2, 4, 8... These requirements are determined by the characteristics of the Hash function, so that the amount of data contained in each partition of our partition table will be relatively even.

Extensions to the Hash partition table:

The Hash partition table is partitioned by the add partition command. Oracle recommends that the number of partitions is a power of 2, for example, 2, 4, 8., etc., which can ensure that the data is evenly distributed among each partition. Of course, as mentioned earlier, it is also necessary for partition key values to be continuously distributed or nearly continuously distributed.
When adding a new partition, it is necessary to divide some original data from the old partition to the new partition. What principles should be followed when selecting the source partition for this data partition?
The main points are as follows: If the partition to be added is the N-th partition, and the integer power of the minimum 2 greater than or equal to N is M, when the N-th partition is added, the data of this partition comes from partition N-M/2.
For example, there is an Hash partition table with 100 partitions. If we want to add one partition to it, it will be 101 partitions, that is, N in the above formula is 101, and the integer power of the minimum 2 greater than 101 is 128, then M is 128. Therefore, the data source of this 101 partition should be 101-128/2=37 partition.
To put it another way, when we add partition 101, we need to lock partition 37, because we need to insert some data from this partition into the new partition 101.
Below, let's use an example to verify the above statement and see what should be paid attention to in actual operation:
The Commodity table is a large table in our system. When the Hash partition table was created for this table a few years ago, the DBA at that time specified 100 partitions when selecting the number of partitions:


select TABLE_NAME,PARTITION_POSITION,PARTITION_NAME,NUM_ROWS from user_tab_partitions where table_name=\'COMMODITY\' order by PARTITION_POSITION;
TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS
-------------- ------------------ ---------------------- ----------
COMMODITY 1 COT_IND01_P1 4405650
COMMODITY 2 COT_IND01_P2 5046650
COMMODITY 3 COT_IND01_P3 5107550
 ... 
COMMODITY 36 COT_IND01_P36 5718800
COMMODITY 37 COT_IND01_P37 9905200
COMMODITY 38 COT_IND01_P38 10118400
COMMODITY 39 COT_IND01_P39 10404950
COMMODITY 40 COT_IND01_P40 9730850
COMMODITY 41 COT_IND01_P41 9457300
COMMODITY 42 COT_IND01_P42 9717950
COMMODITY 43 COT_IND01_P43 9643900
COMMODITY 44 COT_IND01_P44 11138000
COMMODITY 45 COT_IND01_P45 9381300
COMMODITY 46 COT_IND01_P46 10101150
COMMODITY 47 COT_IND01_P47 8809950
COMMODITY 48 COT_IND01_P48 10611050
COMMODITY 49 COT_IND01_P49 10010600
COMMODITY 50 COT_IND01_P50 8252600
COMMODITY 51 COT_IND01_P51 9709900
COMMODITY 52 COT_IND01_P52 8983200
COMMODITY 53 COT_IND01_P53 9012750
COMMODITY 54 COT_IND01_P54 9310650
COMMODITY 55 COT_IND01_P55 8966450
COMMODITY 56 COT_IND01_P56 8832650
COMMODITY 57 COT_IND01_P57 9470600
COMMODITY 58 COT_IND01_P58 8932450
COMMODITY 59 COT_IND01_P59 9994850
COMMODITY 60 COT_IND01_P60 9617450
COMMODITY 61 COT_IND01_P61 10278850
COMMODITY 62 COT_IND01_P62 9277600
COMMODITY 63 COT_IND01_P63 8136300
COMMODITY 64 COT_IND01_P64 10064600
COMMODITY 65 COT_IND01_P65 3710900
 ... 
COMMODITY 99 COT_IND01_P99 5273800
COMMODITY 100 COT_IND01_P100 5293350
100 rows selected.

Querying the data distribution of each partition, we can see that the number of records in 28 partitions from 37 to 64 is about twice that of other partitions. Since 100 is not an integer power of 2, the hash function of Oracle cannot guarantee that the data is evenly distributed. We add a new partition COT_IND01_P101 for the table:


alter table nts_commodity_ts add partition COT_IND01_P101;
Table altered.
Elapsed: 00:06:58.52

Query the number of new partition records after collecting statistics:


select TABLE_NAME,PARTITION_POSITION,PARTITION_NAME,NUM_ROWS from user_tab_partitions where table_name=\'COMMODITY\' and partition_name in (\'COT_IOT_IND01_P37\',\'COT_IOT_IND01_P101\');

TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS
------------------ ------------------ --------------------- ----------
COMMODITY 37 COT__IND01_P37 4905200
COMMODITY 101 COT_IND01_P101 5107550

At this point, we can see that the data in Partition 37 is nearly divided equally between Partitions 37 and 101.
Monitoring the session lock during the partition addition process, we found that two objects were locked in exclusive mode during the period:


SQL> select * from v$lock where sid=1239 and type=\'TM\' and LMODE=6 order by sid,lmode;
ADDR                KADDR          SID TY ID1    ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
FFFFFFFF7D764828 FFFFFFFF7D764888 1239 TM 4004126 0  6 0 72 2
FFFFFFFF7D764828 FFFFFFFF7D764888 1239 TM 4004063 0  6 0 72 2
 What objects are they? 
select OBJECT_NAME,SUBOBJECT_NAME,OBJECT_ID from user_objects where object_id in (4004126,4004063)
OBJECT_NAME SUBOBJECT_NAME OBJECT_ID 
--------------------- ------------------------------ ---------- 
COMMODITY COT_IND01_P100 4004126
COMMODITY COT_IND01_P37 4004063

 As you can see, partitions 37 And 100 It's all locked. Lock 37 Partitioning is expected because data is transferred from the table. Then why lock the first 100 Partition, which is the last 1 What about the partitions? 
 My understanding is: the location of the newly added partition 101 Is the number of partitions of the original partitioned table 100 Determined, if the last of the original table is allowed in the process of adding partitions 1 Partitions 100 Make DDL Operations, such as coalesce Operation, the newly added 101 Partition is not 1 It must be from the original partition 37 Allocate the data, 101 The partition itself should be the new 100 Partition, which causes chaos. At this point, you may say, according to this understanding, should other partitions be locked? Actually, I don't need it, because hash Partition table is not supported drop partition Operation, but only supports coalesce Operation to implement a similar operation, but coalesce Only from the end 1 Partitions begin to shrink. 
 Understand the increase hash What is the practical guiding significance of lock information in the process of table partitioning? 
 Continuing the discussion in the previous example, because partitioning 37 And finally 1 Partitions 100 Will be exclusively locked, so these two partitions cannot be used in the process of adding partitions DML Operation, because DML The operation requires a shared lock to be requested on the partition (mode For 3) . That is, applications that operate these two partitions will be affected. 
Hash Adding partitions to tables will not be like other types of partitioned tables, such as range Partition can be completed quickly, because the process of adding partitions here is to have IO Operation, to transfer data to a new partition. In fact, this is not the most important, because Hash Table is based on partition key Hash Function value to determine the partition, and the main time for adding partitions is actually spent in calculation hash On the value. In the above test, the time consumed by adding a new partition operation is 6 Points 58 Seconds, from the following 10046 Statistics can be seen, where 6 Minutes are spent CPU Operationally, I believe it is mainly Hash Caused by operation. 
[code]
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
 call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      328      0.17       0.27          0          0        148           0
Execute   1520    360.14     396.30     456820   11416202      26357    11565252
Fetch     1767      5.42      21.18      21421      26540          0        2862
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     3615    365.73     417.76     478241   11442742      26505    11568114

In this test case, there are nearly 10 million pieces of data in partition COT_IND01_P37, which takes nearly 7 minutes. Assuming that the partition data reaches 100 million pieces, it should take more than 1 hour. If the number of Hash partitions is an integer power of 2 as recommended by Oracle, we will add a new partition that doubles the original partition when adding partitions, for example, the original partition is 128, and when expanding, we need to add 128 partitions, multiplied by the time needed to add partitions every time, then adding partitions to Hash table will be a horrible operation.
All in all, Hash partitioning has its advantages, but it also has serious disadvantages, such as the partition expansion problem described here. Therefore, at the beginning of project design, we need to carefully choose the number of partitions. However, with the increase of data volume, it is difficult to avoid the operation of adding partitions to partition tables, which is a resource-consuming operation, and the operation of some original partitions will be affected due to the lock problem in the operation process. However, if we delay the partition expansion because we are afraid of the previous problems, the later, with the increase of data volume, the more difficult it is to implement this partition expansion operation.


Related articles: