Detailed Explanation of Semi connection and Anti connection in Oracle

  • 2021-12-13 10:07:52
  • OfStack

When two tables are joined, if the data rows in Table 1 appear in the result set, it needs to be judged according to whether at least one matching data row appears or not in Table 2, this situation will occur semi-join; Anti-joins are complementary sets of semi-joins, and they will appear as options of common join methods in databases, such as NESTED LOOPS, MERGE SORT JOIN, HASH JOIN.

In fact, semi-connection and anti-connection themselves can also be recognized as two connection methods; In CBO optimization mode, The optimizer can flexibly transform and execute statements according to the actual situation, so as to realize semi-join and anti-join methods, After all, there is no SQL syntax that explicitly invokes semi-joins and anti-joins. They are only options that the optimizer can choose when the SQL statement meets certain conditions, but it is still necessary to delve into the performance advantages of these two options in specific situations.

Semi-connection

Semi-joins usually occur when using related subqueries containing IN and EXISTS, = ANY is used the same as IN, so semi-joins can also occur; However, there are exceptions. In the 11gR2 version, the optimizer will not select a semi-join for any subquery contained in the OR branch, which is also the only one clearly identified restriction in the official file. Let's look at several scenarios:


-- Use IN Related subqueries of keywords => Occurrence NESTED LOOPS Semi-connection
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    3 - access("DEPARTMENT_ID"="DEPARTMENT_ID") Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed -- Use EXISTS Related subqueries of keywords => Occurrence NESTED LOOPS Semi-connection
SQL> select department_name
  2  from hr.departments dept where exists
  3  (select null from hr.employees emp where emp.department_id = dept.department_id); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID") Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed -- The predicate uses the OR In the branch EXISTS Subquery => Disable semi-join
SQL> select department_name
  2  from hr.departments dept
  3  where 1=2 OR exists
  4  (select null from hr.employees emp where emp.department_id = dept.department_id); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 440241596 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    27 |   432 |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |                   |       |       |            |          |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |     2 |     6 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    1 - filter( EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMP" WHERE
              "EMP"."DEPARTMENT_ID"=:B1))
   3 - access("EMP"."DEPARTMENT_ID"=:B1) Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         35  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed

From the result set point of view, it is easy to think of inner joins, so why do semi-joins usually get higher performance? This is actually the key to semi-join optimization. Take NESTED LOOPS as an example. In NESTED LOOPS join, after the drive table is read, it needs to enter the inner loop one by one for matching work, and only when the data rows of the outer loop are matched with every row in the inner loop will the acquisition of a result set be completed; In contrast, half-joins differ in that every record in dataset 1 is returned only once, regardless of how many matching records there are in dataset 2. Therefore, half-joins end processing immediately after finding the first matching data in subqueries, thus improving performance.

For some scenarios where semi-joins are needed to improve performance, you can manually control the execution plan of the semi-joins, using the SEMIJOIN and NO_SEMIJOIN hints to specify that the optimizer uses and disables semi-joins, respectively.


-- Use NO_SEMIJOIN Prompt to disable semi-join
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select /*+ no_semijoin */department_id from hr.employees emp); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 3372191744 ------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |   106 |  1802 |     4  (25)| 00:00:01 |
|   1 |  VIEW                | VM_NWVW_2         |   106 |  1802 |     4  (25)| 00:00:01 |
|   2 |   HASH UNIQUE        |                   |   106 |  2544 |     4  (25)| 00:00:01 |
|   3 |    NESTED LOOPS      |                   |   106 |  2544 |     3   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| DEPARTMENTS       |    27 |   567 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN | EMP_DEPARTMENT_IX |     4 |    12 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------ Predicate Information (identified by operation id):
---------------------------------------------------    5 - access("DEPARTMENT_ID"="DEPARTMENT_ID") Statistics
----------------------------------------------------------
        506  recursive calls
          0  db block gets
        188  consistent gets
          7  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
         11  rows processed

In addition, we can select the join type of a semi-join using the _always_semi_join hidden parameter, for optional values for the _always_semi_join parameter:


SQL> SELECT
  2           PARNO_KSPVLD_VALUES     pvalid_par#,
  3           NAME_KSPVLD_VALUES      pvalid_name,
  4           VALUE_KSPVLD_VALUES     pvalid_value,
  5           DECODE(ISDEFAULT_KSPVLD_VALUES, 'FALSE', '', 'DEFAULT' ) pvalid_default
  6   FROM
  7           X$KSPVLD_VALUES
  8   WHERE
  9           LOWER(NAME_KSPVLD_VALUES) LIKE '%'||LOWER(nvl('&pname',name_kspvld_values))||'%'
 10   ORDER BY
 11           pvalid_par#,
 12           pvalid_default,
 13           pvalid_Value
 14   /   PAR# PARAMETER                                          VALUE                          DEFAULT
------ -------------------------------------------------- ------------------------------ -------
  1705 _always_semi_join                                  CHOOSE
       _always_semi_join                                  HASH
       _always_semi_join                                  MERGE
       _always_semi_join                                  NESTED_LOOPS
       _always_semi_join                                  OFF

The default value of this parameter is choose, which means that the type of semi-join selected is determined by the optimizer. Let's use the _always_semi_join parameter to change the above NESTED LOOPS semi-join to HASH JOIN semi-join:


-- Occurs by default NESTED LOOPS SEMI
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    3 - access("DEPARTMENT_ID"="DEPARTMENT_ID") Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed -- session Level modification parameter
SQL> alter session set "_always_semi_join"=merge; Session altered. -- Occurrence MERGE JOIN SEMI
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp); 11 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 954076352 --------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |    10 |   190 |     4  (25)| 00:00:01 |
|   1 |  MERGE JOIN SEMI             |                   |    10 |   190 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPARTMENTS       |    27 |   432 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | DEPT_ID_PK        |    27 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT UNIQUE                |                   |   107 |   321 |     2  (50)| 00:00:01 |
|   5 |    INDEX FULL SCAN           | EMP_DEPARTMENT_IX |   107 |   321 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
       filter("DEPARTMENT_ID"="DEPARTMENT_ID") Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          1  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         11  rows processed
-- From trace Looking at the choice of optimizer, it is very reliable. .

Anti-connection

In essence, Anti-connection and semi-connection have many similar factors, Anti-coupling usually occurs when using NOT IN, NOT EXISTS, Similarly, if the subquery predicate OR branch, Anti-join will also be disabled. The main difference between it and semi-join is in the matching mode of returning data. It will return data rows that are not matched in the subquery, but its optimization principle is 1. By finding the first matching record in the subquery and immediately stopping processing, the efficiency will be improved. 1 is the concentrated scene:


SQL> set autotrace traceonly
-- NOT IN Triggered reverse connection
SQL> select department_name
  2  from hr.departments
  3  where department_id not in
  4  (select department_id from hr.employees where department_id is not null); 16 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
       filter("DEPARTMENT_ID" IS NOT NULL) Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         12  consistent gets
          6  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         16  rows processed -- EXISTS Triggered reverse connection
SQL> select department_name
  2  from hr.departments dept
  3  where not exists
  4  (select null from hr.employees emp where emp.department_id = dept.department_id); 16 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID") Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
         13  consistent gets
          0  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         16  rows processed


From the above example, it can be seen that minus and outer join operations can skillfully achieve the same result. However, from the execution plan, minus operation obviously does not have anti-join operation optimization, while outer join has anti-join optimization, but it is not recommended to use it in practice because it uses virtual records with null values to match data rows.

If you want to manually control the execution plan of the anti-join, there are also 1 hint and parameters that can be used. The commonly used hint are:

1. ANTIJOIN-Performs a reverse join, and the optimizer determines the join type
2. USE_ANTI-Tips for older versions, and ANTIJOIN functionality 1
3. [NL_AJ] [HASH_AJ] [MERGE_AJ]-Specifies the type in which the reverse join occurs (10g begins to be discarded, but still works)

In terms of parameter control, there is also a parameter _ always_anti_join which is very similar to _ always_semi_join, and its usage is completely 1; There are also parameters _optimizer_null_aware_antijoin, and _optimizer_outer_to_anti_enable are used to control the inverse join conversion with null values and outer joins.


-- Use hint Explicitly specifying the reverse join type
SQL> select department_name
  2  from hr.departments dept
  3  where not exists (select /*+ hash_aj */ null from hr.employees emp
  4  where emp.department_id = dept.department_id); 16 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 3587451639 ----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN ANTI    |                   |    17 |   323 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|   3 |   INDEX FULL SCAN  | EMP_DEPARTMENT_IX |   107 |   321 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    1 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID") Statistics
----------------------------------------------------------
        566  recursive calls
          0  db block gets
        193  consistent gets
          0  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
         16  rows processed -- Use _optimizer_null_antijoin Parameter to turn off the null value consideration option in reverse join - That is, if a null value is returned, no reverse join is used
SQL> alter session set "_optimizer_null_aware_antijoin"=false; Session altered. SQL> select department_name
  2  from hr.departments
  3  where department_id not in (select department_id from hr.employees); no rows selected Execution Plan
----------------------------------------------------------
Plan hash value: 3416340233 ----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |    26 |   416 |    30   (0)| 00:00:01 |
|*  1 |  FILTER            |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMPLOYEES   |     2 |     6 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    1 - filter( NOT EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMPLOYEES"
              WHERE LNNVL("DEPARTMENT_ID"<>:B1)))
   3 - filter(LNNVL("DEPARTMENT_ID"<>:B1)) Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        172  consistent gets
          0  physical reads
          0  redo size
        343  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


Related articles: