Parsing Parallel Query Instances for Oracle Parallel Operations

2021-11-13 18:41:10
OfStack

Parallel operation of Oracle database, in essence, is to squeeze out idle resources of database server (mainly CPU resources), and divide and conquer some high-load and large-volume data. Parallel operation is a non-deterministic optimization strategy, which needs to be treated carefully when choosing. At present, the parallel operation features are mainly used in the following aspects:

Parallel Query: Parallel query, using multiple operating system-level Server Process to complete one SQL query at the same time;
Parallel DML: Parallel DML operations. Similar to Parallel Query. When you want to operate DML on large data scales, such as insert, update and delete, you can consider using it.
Parallel DDL: Parallel DDL operations. Such as large-capacity data table construction, index rebuild and other operations;
Parallel Recovery, parallel recovery. When the database instance crashes and restarts, or when the storage medium is restored, the parallel recovery technology can be started. Thereby achieving the purpose of reducing recovery time;
Procedural Parallel, procedure code parallelization. For the code fragments, stored procedures or functions we write, we can realize the parallelization of execution, thus speeding up the execution efficiency;

1. Parallel query Parallel Query

Parallel query of Oracle database is a basic technology, and it is also a parallel technology often used by OLAP, Oracle, Data and Warehouse. Like the elements highlighted in the previous 1 in this series, there are 1 prerequisites for software and hardware before deciding to use parallel technology:

Task task requirements. The alternative task for parallel operations, task, must be a large task job, such as a long query. Task time can usually be counted in minutes and hours. Only such tasks and needs are worth risking the use of parallel operation schemes;
Resource idle condition. Parallel processing can only be considered when the database server resources are idle. If you are busy frequently, rashly using parallelism can only aggravate the contention for resources.

The biggest risk of parallel operation is that the efficiency caused by parallel contention does not increase but decreases. Therefore, after determining two premises, parallel planning processing should be carried out.

2. Environmental preparation

First, prepare the experimental environment. Because the author uses a home PC virtual machine, the parallelism and storage can not reflect the real conditions and requirements, forgive me.


SQL> select * from v$version where rownum<2;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
SQL> select count(*) from t;
 COUNT(*)
----------
1160704

Select the 11gR2 server environment, and the total data volume of the data table T exceeds 1 million.

First, let's look at the execution without parallelism in 1.


// Extracting the information of using cursors; 
SQL> select sql_text, sql_id, version_count from v$sqlarea where sql_text like 'select count(*) from t%';
SQL_TEXT            SQL_ID    VERSION_COUNT
------------------------------ ------------- -------------
select count(*) from t     2jkn7rpsbj64t       2
SQL> select * from table(dbms_xplan.display_cursor('2jkn7rpsbj64t',format => 'advanced', cursor_child_no => 0));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 2jkn7rpsbj64t, child number 0
-------------------------------------
select count(*) from t
Plan hash value: 2966233522
-------------------------------------------------------------------
| Id | Operation     | Name | Rows | Cost (%CPU)| Time   |
-------------------------------------------------------------------
|  0 | SELECT STATEMENT  |   |    | 4464 (100)|     |
|  1 | SORT AGGREGATE  |   |   1 |      |     |
|  2 |  TABLE ACCESS FULL| T  | 1160K| 4464  (1)| 00:00:54 |
-------------------------------------------------------------------

Parallelism is not used in this execution plan, and full table scanning is performed. The execution time is 54s.

3. Parallel query plan

First, we set the corresponding degree of parallelism. There are two ways to set the degree of parallelism. One is to use hint to add it to a specific SQL statement. The other is to set the parallelism attribute on large objects.

The advantage of the former is that it is mandatory and targeted. Is to specify a specific SQL statement for parallel processing. The advantage is that it is easy to control the degree of parallelism, but the disadvantage is that it has a strong coercive force. When the amount of data is small, it has little advantage to use parallelism. Moreover, if the specified parallelism is displayed, it will bring the disadvantage of poor migration scalability.

The latter specifies the degree of parallelism through the attributes of the object. Parallelism is provided to the optimizer for selection as a means of execution. In this way, CBO will generate the execution plan according to the resources in the system and the actual data. Parallel may or may not be in the plan. This has the advantage of leaving parallelism to the optimizer CBO, but the disadvantage is the abuse risk of parallelism.

Here, the author sets the way to automatically determine the parallelism.


SQL> alter table t parallel;
Table altered
SQL> select count(*) from t;
 COUNT(*)
----------
1160704

When a query is launched, the pool of parallel server processes in Oracle determines the number of parallel processes allocated based on load and actual factors in the system. At this point, we can view the connection information in the process pool through the view v $px_process.


SQL> select * from v$px_process;
SERVER_NAME STATUS      PID SPID       SID  SERIAL#
----------- --------- ---------- ------------------------ ---------- ----------
P000    AVAILABLE     25 5776                
P001    AVAILABLE     26 5778

Note that the parallel server process is a special Server Process, which is essentially a shareable slave process. In the dedicated connection mode, the general Server Process and Client Process are "living and dying together", and serve one Client Process for life. The slave process is managed through the process pool. Once initialization is started, it will reside in the system within a certain time, waiting for the next parallel processing.

At this point, we examine the v $process view, and we can also find the corresponding information.


SQL> select * from v$process;
PID SPID     PNAME USERNAME      SERIAL# PROGRAM    
-------- ---------- ------------------------ ----- --------------- ---------- -------------------------------
25 5776      P000 oracle         13 oracle@oracle11g (P000)   
26 5778      P001 oracle          6 oracle@oracle11g (P001)   
 (The space is omitted for reasons...) 
32 rows selected

In the corresponding OS, there is also a corresponding real process service.


[oracle@oracle11g ~]$ ps -ef | grep oracle
 (The space is omitted for reasons...) 
oracle  5700   1 0 17:29 ?    00:00:02 oraclewilson (LOCAL=NO)
oracle  5723   1 0 17:33 ?    00:00:00 ora_smco_wilson
oracle  5764   1 2 17:40 ?    00:00:05 oraclewilson (LOCAL=NO)
oracle  5774   1 0 17:42 ?    00:00:00 oraclewilson (LOCAL=NO)
oracle  5776   1 0 17:43 ?    00:00:00 ora_p000_wilson
oracle  5778   1 0 17:43 ?    00:00:00 ora_p001_wilson
oracle  5820   1 1 17:44 ?    00:00:00 ora_w000_wilson

Since the query has ended at this time, the corresponding parallel session information has disappeared and become invisible.


SQL> select * from v$px_session;
SADDR      SID  SERIAL#   QCSID QCSERIAL# 
-------- ---------- ---------- ---------- ----------

However, if the task takes a long time, the corresponding information can be captured.

From the above situation, when we perform a parallel operation, Oracle will get the corresponding parallel process from the server process pool to operate. When the operation is completed, the server process will wait for 1 fixed time before recycling.

Resource consumption of parallel manipulation processes, viewed through the v $px_sysstat view.


SQL> col statistic for a30;
SQL> select * from v$px_process_sysstat;
STATISTIC              VALUE
------------------------------ ----------
Servers In Use             0
Servers Available            0
Servers Started             2
Servers Shutdown            2
Servers Highwater            2
Servers Cleaned Up           0
Server Sessions             6
Memory Chunks Allocated         4
Memory Chunks Freed           0
Memory Chunks Current          4
Memory Chunks HWM            4
Buffers Allocated           30
Buffers Freed             30
Buffers Current             0
Buffers HWM               8
15 rows selected

Next, let's check the execution plan information.


SQL> set pagesize 10000;
SQL> select * from table(dbms_xplan.display_cursor('2jkn7rpsbj64t',format => 'advanced',cursor_child
_no => 1));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID 2jkn7rpsbj64t, child number 1
-------------------------------------
select count(*) from t
Plan hash value: 3126468333
----------------------------------------------------------------------------------------------------
| Id | Operation       | Name   | Rows | Cost (%CPU)| Time   |  TQ |IN-OUT| PQ Distrib
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |     |    | 2478 (100)|     |    |   |      
|  1 | SORT AGGREGATE    |     |   1 |      |     |    |   |      
|  2 |  PX COORDINATOR    |     |    |      |     |    |   |      
|  3 |  PX SEND QC (RANDOM) | :TQ10000 |   1 |      |     | Q1,00 | P->S | QC (RAND) 
|  4 |   SORT AGGREGATE   |     |   1 |      |     | Q1,00 | PCWP |      
|  5 |   PX BLOCK ITERATOR |     | 1160K| 2478  (1)| 00:00:30 | Q1,00 | PCWC |      
|* 6 |    TABLE ACCESS FULL| T    | 1160K| 2478  (1)| 00:00:30 | Q1,00 | PCWP |      
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access(:Z>=:Z AND :Z<=:Z)

Condition from execution plan (6-access (: Z) > =:Z AND :Z < =: Z), we can see the task assignments, followed by full table scans, and finally sort the calculation of count, merging the results.

4. Conclusion

Oracle Parallel Query is a commonly used parallel operation technology. Compared with DDL, DML and other types of operations, parallel query can be carried out as one step of system function.

The biggest risk of parallel query is the emergence of parallel abuse and out of control. This is also the problem that Oracle1 is trying to solve. In Oracle11gR2, the technical characteristics of Parallel Statement Queuing (PSQ) are introduced. Typically, Oracle introduces as many parallel processes as possible to operate as the parallel server pool allows. The emergence of PSQ technology is to add parallel control from the perspective of resources.

When the system is busy, PSQ will put one of the parallel operations to be carried out into a waiting state to prevent the deterioration of the parallel environment. When the environment improves, the parallel statements in the wait queue enter the execution state. This feature can effectively prevent the occurrence of parallel abuse.