Preliminary study on query efficiency calculation of Postgresql
- 2020-05-06 11:55:53
- OfStack
the
One important aspect of relational databases is query speed. The quality of query speed directly affects the quality of a system.
Query speed generally requires a peek into the execution process through query planning.
The query path selects the least expensive path to execute. And how did we figure out that cost?
focuses on the parameters and the table
parameter: is from the postgresql.conf file,
can be viewed through show
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 4.0 # same scale as above
cpu_tuple_cost = 0.01 # same scale as above
cpu_index_tuple_cost = 0.005 # same scale as above
cpu_operator_cost = 0.0025 # same scale as above
parallel_tuple_cost = 0.1 # same scale as above
parallel_setup_cost = 1000.0 # same scale as above
Table (view): pg_class (focus on relpages, reltuples), pg_stats
analyzes the costing process for simple queries
Set up the simulation data and insert 100,000 pieces of data into a table
create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i);
no index case
Analyze the full table query costing process
postgres=# analyze test; # In case there is no analysis
postgres=# explain select * from test;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on test (cost=0.00..1834.00 rows=100000 width=37)
1. Query pg_class table to see page number and row number
of test table
postgres=# select t.relpages, t.reltuples from pg_class t where t.relname = 'test';
relpages | reltuples
----------+-----------
834 | 100000
How was the cost of 1834.00 calculated?
2. In this process, 834 page were actually scanned sequentially, and 100,000 lines of
were emitted from the node
3. View the configuration parameter
seq_page_cost = 1.0
cpu_tuple_cost = 0.01
4. The result is
postgres=# select 834 * 1.0 + 100000 * 0.01;
?column?
----------
1834.00
5. The query cost is 1834.00. It is consistent with the above query plan.
Full list of cost - adding conditions for the costing process
postgres=# explain select * from test where id = 100;
QUERY PLAN
--------------------------------------------------------
Seq Scan on test (cost=0.00..2084.00 rows=1 width=37)
Filter: (id = 100)
How is the cost 2084.00 calculated?
1. Query pg_class table, pages,tuples as in the above example
2. This process is the order test table, which emits 100,000 rows, and then filters 100,000 rows
through cloud storage
3. View the cost of the filter operation line
cpu_operator_cost = 0.0025
4. The result is
postgres=# select 834 * 1.0 + 100000 * 0.01 + 100000 * 0.0025;
?column?
-----------
2084.0000
Join index case
```
create index on test(id);
```
compares the following four situations:
Index Only Scan
create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i);
0
Index Scan
create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i);
1
Index Scan
create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i);
2
Insert the data into
out of order
truncate table test;
insert into test(id, info) select i, md5(i::text) from generate_series(1, 1000000) t(i) order by random();
create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i);
4
conclusion
summary