Optimization of SQL for count join order of tables condition order in and exist

  • 2021-12-12 06:12:34
  • OfStack

This article detailed the SQL optimization for count, table join order, condition order, in and exist optimization, very practical value! Details are as follows:

1. About count

After reading some articles about count (*) and count (column) on the Internet, is count (column) definitely more efficient than count (*)?

In fact, I personally think that count (*) and count (column) are not comparable at all. count (*) counts the total number of records in the table, while count (column) counts the number of non-empty records in the current column.

However, we can compare 1 by experiment:

First, create the test table:


drop table test purge;
create table test as select * from dba_objects;

update test set object_id =rownum ;
set timing on 
set linesize 1000
set autotrace on 

Execute


select count(*) from test;
select count(object_id) from test;

It is found that the time consumption is one kind, is their efficiency actually one kind?

Let's try creating an index on the column object_id


create index idx_object_id on test(object_id);

And then execute


select count(*) from test;
select count(object_id) from test;

It is found that the speed of count (object_id) is obviously 1% higher than that of count (*). Is it because count (object_id) can use indexes that the efficiency is greatly improved?

Let's modify the column properties of object_id again


alter table test modify object_id not null;

And then execute


select count(*) from test;
select count(object_id) from test;

It is found that their speed is as fast as 1, and count (*) can also be used in the index.
In fact, the premise of efficiency comparison is that the writing of two sentences should be equivalent, which is not equivalent at all, so it is not comparable.

For oracle optimizer, we can find through experiments that the statistical time of different columns in count is different. The general trend is that the lower the column, the greater the access overhead, and the offset of the column determines the access performance. The overhead of count (*) is independent of offset. Therefore, in some cases, count (*) is the fastest.

2. About in and exist

Most of the statements about in and exist say that the efficiency of in is higher than that of exist, so where there is in, it must be replaced by exist and so on. But is this really the case?

Let's do an experiment:

In Oracle 10g;


select * from dept where deptno NOT IN ( select deptno from emp ) ;
select * from dept where not exists ( select deptno from emp where emp.deptno=dept.deptno) ;

We found that exist is indeed more efficient than in. This statement seems to be true.

But let's execute the following statement again


select * from dept where deptno NOT IN ( select deptno from emp where deptno is not null) and deptno is not null;

You will find that the efficiency of in and exist is the same after adding non-null constraints.

Looking at the execution plan of the three statements, you will find that in statement and exist statement without non-null constraints are all ANTI semi-join algorithm, so the efficiency is one, while in statement without non-null constraints uses filter instead of ANTI algorithm, so the efficiency is less than one.

Therefore, we can conclude that in oracle 10g, if non-null can be ensured, in constraint can be used in ANTI semi-join algorithm, and the efficiency at this time is the same as that of exist.

In Oracle 11g:


select * from dept where deptno NOT IN ( select deptno from emp ) ;
select * from dept where not exists ( select deptno from emp where emp.deptno=dept.deptno) ;

We found that the efficiency of the two statements is one, and so is the viewing execution plan. Originally, oracle has been optimized in 11g, so the efficiency of in and exist is the same.

Therefore, we can conclude that in 11g, the efficiency of using in and exist is the same, because they all follow the more efficient ANTI algorithm.

3. About the join order of size tables

We can see many such articles on the Internet. When querying multiple tables, we use small tables or crosstabs as the basic tables and put them at the back, while large tables are placed at the back of from, because the access order of tables is from right to left.

But is this really the case?

We can do experimental verification 1 (here the test environment is Oracle 11g):


create table tab_big as select * from dba_objects where rownum<=30000;
create table tab_small as select * from dba_objects where rownum<=10;
set autotrace traceonly
set linesize 1000
set timing on 
select count(*) from tab_big,tab_small ; 
select count(*) from tab_small,tab_big ;

When we look at the execution plan, we can find that the efficiency of these two statements is the same. Is the order of tables irrelevant to the efficiency of multi-table queries?

We are executing the following statement:


select count(*) from test;
select count(object_id) from test;

0

We can clearly find that small table in the right, large table in the left statement, query efficiency is much higher.

In fact, in the rule-based era, the query efficiency is related to the join order of tables. The execution efficiency of small tables or crosstabs on the left and large tables on the right will be higher by 1. However, it is basically a cost-based era, so the order of size tables has nothing to do with efficiency, and oracle optimizer will automatically optimize efficiency.

4. Order of join conditions in where clause

In the rule-based era, oracle uses bottom-up order to resolve where clauses. According to this principle, we generally put the table with the least number of rows possible to return at the back, and the clause with filter conditions in where clause at the back.

However, in the current cost-based era, this optimization is helped by oracle optimizer, so the order of tables and conditions will not affect our query efficiency.


Related articles: