Mysql 8.0. 18 hash join Test of Recommendation

2021-12-13 10:04:05
OfStack

Hash Join

Hash Join does not require any indexes to execute and in most cases is more efficient than the current block nested loop algorithm.

The following example code shows you the Mysql 8.0. 18 hash join test as follows:


CREATE TABLE COLUMNS_hj as select * from information_schema.`COLUMNS`;
INSERT INTO COLUMNS SELECT * FROM COLUMNS; --  Finally 1 Secondary insertion 25 Wan Hang 

CREATE TABLE COLUMNS_hj2 as select * from information_schema.`COLUMNS`;


explain format=tree
SELECT 
 COUNT(c1. PRIVILEGES),
 SUM(c1.ordinal_position)
FROM
 COLUMNS_hj c1,
 COLUMNS_hj2 c2
WHERE
 c1.table_name = c2.table_name
AND c1.column_name = c2.column_name
GROUP BY
 c1.table_name,
 c1.column_name
ORDER BY
 c1.table_name,
 c1.column_name;

You must use format=tree (new feature of 8.0. 16) to view the execution plan of hash join:


-> Sort: <temporary>.TABLE_NAME, <temporary>.COLUMN_NAME
 -> Table scan on <temporary>
  -> Aggregate using temporary table
   -> Inner hash join (c1.`COLUMN_NAME` = c2.`COLUMN_NAME`), (c1.`TABLE_NAME` = c2.`TABLE_NAME`) (cost=134217298.97 rows=13421218)
    -> Table scan on c1 (cost=1.60 rows=414619)
    -> Hash
     -> Table scan on c2 (cost=347.95 rows=3237)


set join_buffer_size=1048576000;

SELECT 
 COUNT(c1. PRIVILEGES),
 SUM(c1.ordinal_position)
FROM
 COLUMNS_hj c1,
 COLUMNS_hj2 c2
WHERE
 c1.table_name = c2.table_name
AND c1.column_name = c2.column_name
GROUP BY
 c1.table_name,
 c1.column_name
ORDER BY
 c1.table_name,
 c1.column_name;

About 1.5 seconds.

Looking at BNL again, create indexes first (optimize them separately, and then compare the results fairly).


alter table columns_hj drop index idx_columns_hj;
alter table columns_hj2 drop index idx_columns_hj2;
create index idx_columns_hj on columns_hj(table_name,column_name);
create index idx_columns_hj2 on columns_hj2(table_name,column_name);

-> Sort: <temporary>.TABLE_NAME, <temporary>.COLUMN_NAME
 -> Table scan on <temporary>
  -> Aggregate using temporary table
   -> Nested loop inner join (cost=454325.17 rows=412707)
    -> Filter: ((c2.`TABLE_NAME` is not null) and (c2.`COLUMN_NAME` is not null)) (cost=347.95 rows=3237)
     -> Table scan on c2 (cost=347.95 rows=3237)
    -> Index lookup on c1 using idx_COLUMNS_hj (TABLE_NAME=c2.`TABLE_NAME`, COLUMN_NAME=c2.`COLUMN_NAME`) (cost=127.50 rows=127)

About 4.5 seconds. It can be seen that hash join effect is still lever.

I have to spit out the optimizer prompt of mysql, and it seems that HASH_JOIN/NO_HASH_JOIN is not effective.

In addition to hash_join, the SET_VAR optimizer hint introduced by mysql 8.0. 3 is useful for setting statement-level parameters (supported by oracle and remembered by mariadb), as follows:


mysql> select /*+ set_var(optimizer_switch='index_merge=off') set_var(join_buffer_size=4M) */ c_id from customer limit 1;

List of variables supported by SET_VAR:


auto_increment_increment
auto_increment_offset
big_tables
bulk_insert_buffer_size
default_tmp_storage_engine
div_precision_increment
end_markers_in_json
eq_range_index_dive_limit
foreign_key_checks
group_concat_max_len
insert_id
internal_tmp_mem_storage_engine
join_buffer_size
lock_wait_timeout
max_error_count
max_execution_time
max_heap_table_size
max_join_size
max_length_for_sort_data
max_points_in_geometry
max_seeks_for_key
max_sort_length
optimizer_prune_level
optimizer_search_depth variables
optimizer_switch
range_alloc_block_size
range_optimizer_max_mem_size
read_buffer_size
read_rnd_buffer_size
sort_buffer_size
sql_auto_is_null
sql_big_selects
sql_buffer_result
sql_mode
sql_safe_updates
sql_select_limit
timestamp
tmp_table_size
updatable_views_with_limit
unique_checks
windowing_use_high_precision

Summarize