Introduction to Oracle Table Three Connection Modes of sql Optimization

  • 2021-12-19 07:12:33
  • OfStack

1. NESTED LOOP

The nested loop connection is a good choice for small data subsets being connected. nested loop is to scan a table, and every time a record is read, it will be searched in another table according to the index. Without index 1, it will not be nested loops. In nested loop, the result set of driving table satisfying conditions is not large, and the connection field of driven table should have index, so nstedloop is taken. If the driver table returns too many records, it is not suitable for nested loops. If the join field does not have an index, it is appropriate to go hash join because no index is required.

The ordered hint can be used to change the default driver table for CBO, and the USE_NL (table_name1) hint can be used to force nested loop.

The main points are as follows:

1) Nested loop joins are a good choice when the data subset is small
2) Using USE_NL (table_name1) table_name2) forces CBO to perform a nested loop connection
3) Nested loop1 is used when there are indexes in the joined table and the index selectivity is good
4) The order of OIN is very important. Recordset 1 of driving table must be small, and the response time of returning result set is the fastest.
5) The working mode of Nested loops is to read data from one table and access another table (usually index) for matching. nested loops is applicable when a correlation table is relatively small, and its efficiency will be higher.

2. HASH JOIN

hash join is a common way for CBO to connect large data sets. The optimizer scans the small table (data source), builds the hash table in memory using the join key (that is, calculates the hash value according to the join field), then scans the large table, probes the hash table once every time a record is read, and finds the rows that match the hash table.

When small tables can all be put into memory, the cost is close to the sum of the costs of scanning two tables in full table. If the table is too large to fit completely into memory, the optimizer will divide it into several different partitions. If it cannot fit into memory, the partition will be written to the temporary segment of disk. At this time, there should be a larger temporary segment to maximize the performance of I/O. Partitions in the temporary segment need to be swapped into memory for hash join. At this time, the cost is close to the sum of the costs of scanning small tables with full tables + number of partitions * scanning large tables with full tables.

As for partitioning both tables, the advantage is that you can use parallel query, where multiple processes perform join on different partitions at the same time and then merge them. But complicated.

When using hash join, the initialization parameters of HASH_AREA_SIZE must be large enough. If it is 9i and Oracle, it is recommended to use SQL workspace automatic management, set WORKAREA_SIZE_POLICY to AUTO, and then adjust PGA_AGGREGATE_TARGET.

hash join may have advantages under the following conditions:
1) Join between two huge tables.
2) Join between a large table and a small table.

The main points are as follows:
1) Hash join is a common way for CBO to connect large data sets.
2) Hash joins can also be forced with the USE_HASH (table_name1) hint
3) Hash join when the amount of data in the two tables is very different.
4) The working mode of Hash join is to do hash operation on one table (usually the table with one point smaller) and store it in hash list, extract records from another table, do hash operation, find corresponding values in hash list and match them.

The ordered hint can be used to change the default driver table for CBO, and the USE_HASH (table_name1) hint can be used to force hash join.

3. SORT MERGE JOIN

a) Do table access full for each join table;
b) to sort the results of table access full;
c) merge join to merge the sorting results.

sort merge join performance overhead is almost all in the first two steps. In general, 9i has rarely appeared in the absence of index, because of its high sorting cost, and most of them are replaced by hash join.
Generally, hash join works better than sort merge join, but if the row source has been ordered, the performance of sort merge join will be better than hash join if no further ordering is needed when executing sort merge join.
sort merge join performs better than nested loops when full table scanning is preferable to "index range scanning followed by table access through rowid".

The main points are as follows:

1) Use USE_MERGE (table_name1) to force a sort merge connection.
2) Sort Merge join is used when there is no index and the data has been sorted.
3) Join step: Sort the two tables, and then merge the two tables.
4) Normally, this JOIN is used only if:
a) RBO mode
b) Unequal correlation ( > , < , > =, < =, < > )
c) bHASH_JOIN_ENABLED=false
d) Data source sorted
e) Merge Join first sorts the associated columns of the associated table, then extracts data from their respective sorting tables and matches them in another sorting table, because merge join needs to do more sorting, so it consumes more resources.

f) like ,not like
Generally speaking, where merge join can be used, hash join can give better performance

You can use USE_MERGE (table_name1) to prompt the mandatory use of sort merge join.


Related articles: