PostgreSQL tutorial of xiv: database maintenance

  • 2020-05-06 11:53:45
  • OfStack

Restore disk space:

In PostgreSQL, rows deleted or updated using delete and update statements are not actually deleted, but simply have the state of the row set to deleted or expired at the physical address of the older version of the row. Therefore, when the data in the data table changes very frequently, the space occupied by the table will become large after a period of time, while the amount of data may not change much. To solve this problem, you need to periodically perform VACUUM operations on tables where the data changes frequently.

The       VACUUM command comes in two forms, VACUUM and VACUUM FULL

  无VACUUM VACUUM VACUUM FULL
删除大量数据之后 只是将删除数据的状态置为已删除,该空间不能记录被重新使用。 如果删除的记录位于表的末端,其所占用的空间将会被物理释放并归还操作系统。如果不是末端数据,该命令会将指定表或索引中被删除数据所占用空间重新置为可用状态,那么在今后有新数据插入时,将优先使用该空间,直到所有被重用的空间用完时,再考虑使用新增的磁盘页面。 不论被删除的数据是否处于数据表的末端,这些数据所占用的空间都将被物理的释放并归还于操作系统。之后再有新数据插入时,将分配新的磁盘页面以供使用。
执行效率   由于只是状态置为操作,因此效率较高。 在当前版本的PostgreSQL(v9.1)中,该命令会为指定的表或索引重新生成一个数据文件,并将原有文件中可用的数据导入到新文件中,之后再删除原来的数据文件。因此在导入过程中,要求当前磁盘有更多的空间可用于此操作。由此可见,该命令的执行效率相对较低。
被删除的数据所占用的物理空间是否被重新规划给操作系统。 不会 不会
在执行VACUUM命令时,是否可以并发执行针对该表的其他操作。   由于该操作是共享锁,因此可以与其他操作并行进行。 由于该操作需要在指定的表上应用排它锁,因此在执行该操作期间,任何基于该表的操作都将被挂起,知道该操作完成。
推荐使用方式 在进行数据清空是,可以使用truncate操作,因为该操作将会物理的清空数据表,并将其所占用的空间直接归还于操作系统。 为了保证数据表的磁盘页面数量能够保持在一个相对稳定值,可以定期执行该操作,如每天或每周中数据操作相对较少的时段。 考虑到该操作的开销,以及对其他错误的排斥,推荐的方式是,定期监控数据量变化较大的表,只有确认其磁盘页面占有量接近临界值时,才考虑执行一次该操作。即便如此,也需要注意尽量选择数据操作较少的时段来完成该操作。
执行后其它操作的效率 对于查询而言,由于存在大量的磁盘页面碎片,因此效率会逐步降低。 相比于不执行任何VACUUM操作,其效率更高,但是插入的效率会有所降低。 在执行完该操作后,所有基于该表的操作效率都会得到极大的提升。

Ii. Update planner statistics:

When selecting the optimal path, the       PostgreSQL query planner needs to refer to the statistics of the relevant data table to generate the most reasonable plan for the query. These statistics are obtained from the ANALYZE command, which you can call directly, or as an optional step in the VACUUM command, such as VACUUM ANAYLYZE table_name, which will execute VACUUM before ANALYZE. Like the recovery space (VACUUM), ANALYZE maintains a certain frequency for tables with frequent data updates, so that the statistics of the table are always in a relatively new state, which is extremely beneficial for query optimization based on the table. However, for infrequently updated tables, this operation is not required.

      we can run the ANALYZE command for a specific table, or even for a specific field in the table, so that we can only perform the ANALYZE operation on the part of the information that is updated frequently according to the actual situation, which can not only save the space occupied by the statistics, but also improve the efficiency of the ANALYZE operation. An additional caveat here is that ANALYZE is a fairly fast operation, even on large tables, because it USES statistical random sampling to sample rows, rather than reading in and analyzing every row. Therefore, consider periodically executing this command against the entire database.

In fact, we can even adjust the sampling rate for the specified field by following the command        


    ALTER TABLE testtable ALTER COLUMN test_col SET STATISTICS 200
 

Note: this value ranges from 0 to 1000. The lower the value, the lower the sampling ratio and the less accurate the analysis results, but the faster the ANALYZE command executes. If the value is set to -1, then the sampling ratio of this field will revert to the current default sampling value of the system, and we can get the default sampling value of the current system through the following command.
 

    postgres=# show default_statistics_target;
     default_statistics_target
    ---------------------------
     100
    (1 row)
 

As you can see from the above results, the default sample value for this database is 100(10%).

Examples of iii, VACUUM, and ANALYZE:
     


    #1. Create the test data table.
    postgres=# CREATE TABLE testtable (i integer);
    CREATE TABLE
    #2. Create an index for the test table.
    postgres=# CREATE INDEX testtable_idx ON testtable(i);
    CREATE INDEX
    #3. Creates a function to batch insert test data.
    postgres=# CREATE OR REPLACE FUNCTION test_insert() returns integer AS $$
    DECLARE
        min integer;
        max integer;
    BEGIN
        SELECT COUNT(*) INTO min from testtable;
        max := min + 10000;
        FOR i IN min..max LOOP
            INSERT INTO testtable VALUES(i);
        END LOOP;
        RETURN 0;
    END;
    $$ LANGUAGE plpgsql;
    CREATE FUNCTION
    #4. Batch insert data into the test table ( Perform four times )
    postgres=# SELECT test_insert();
     test_insert
    -------------
               0
    (1 row)
    #5. Make sure all four batch inserts are successful.
    postgres=# SELECT COUNT(*) FROM testtable;
     count
    -------
     40004
    (1 row)
    #6. Analyze the test table so that statistics about the table are updated to PostgreSQL System table.
    postgres=# ANALYZE testtable;
    ANALYZE
    #7. View the number of pages currently occupied by the test table and index ( Typically, each page is 8k) .
    postgres=# SELECT relname,relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable_idx';
        relname       | relfilenode    | relpages
    ---------------+-------------+----------
     testtable        |       17601   |      157
     testtable_idx  |       17604   |       90
    #8. Batch deletion of data.
    postgres=# DELETE FROM testtable WHERE i < 30000;
    DELETE 30003
    #9. perform vacuum and analyze In order to update the system table and record the high water mark for the table and index.
    #10. It is also important to note that the data deleted above is at the front of the data table. If the data deleted is at the end of the table,
    #      Such as where i > 10000 So in execution VACUUM ANALYZE The data table will be physically shrunk.
    postgres=# VACUUM ANALYZE testtable;
    ANALYZE
    #11. View test tables and indexes after dropping before passing VACUUM ANALYZE Results after system statistics are updated ( Remain the same ) .
    postgres=# SELECT relname,relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable_idx';
        relname      | relfilenode     | relpages
    ---------------+-------------+----------
     testtable        |       17601   |      157
     testtable_idx  |       17604   |       90
    (2 rows)
    #12. Rebatch the inserts twice, and then analyze the table to update its statistics.
    postgres=# SELECT test_insert(); -- Execute it twice.
     test_insert
    -------------
               0
    (1 row)
    postgres=# ANALYZE testtable;
    ANALYZE
    #13. At this point, you can see that the number of pages in the data table is still the same as the previous number of high-water tags, and the number of index pages increases
    #      It is related to its internal implementation, but the number of pages occupied by the index will not continue to increase in the following inserts.
    postgres=# SELECT relname,relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable_idx';
        relname       | relfilenode    | relpages
    ---------------+-------------+----------
     testtable        |       17601   |      157
     testtable_idx  |       17604   |      173
    (2 rows)
    postgres=# SELECT test_insert();
     test_insert
    -------------
               0
    (1 row)
    postgres=# ANALYZE testtable;
    ANALYZE
    #14. You can see that the number of pages indexed is indeed not increasing.
    postgres=# SELECT relname,relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable_idx';
        relname      | relfilenode    | relpages
    ---------------+-------------+----------
     testtable        |       17601   |      157
     testtable_idx  |       17604   |      173
    (2 rows)
    #15. Rebatch data deletion.
    postgres=# DELETE FROM testtable WHERE i < 30000;
    DELETE 19996
    #16. As you can see from the following query, it is executing VACUUM FULL After the command, test the number of pages occupied by the table and index
    #      They do, which means they're taking up less physical space.
    postgres=# VACUUM FULL testtable;
    VACUUM
    postgres=# SELECT relname,relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable_idx';
        relname      | relfilenode     | relpages
    ---------------+-------------+----------
     testtable        |       17602   |      118
     testtable_idx  |       17605   |       68
    (2 rows)

4. Rebuild the index regularly:

In PostgreSQL, it is necessary to periodically rebuild indexes (REINDEX INDEX) for frequently updated data tables. For the B-Tree index, only those index pages that have been completely emptied will be reused. For those that have only partial space available, the index pages will not be reused. In this extreme case, with very low utilization per index page, a significant increase in the amount of data can result in extremely large index files, which not only reduces query efficiency, but also threatens to fill up the entire disk space.
      has another performance advantage over reconstructed indexes, because logically linked pages are often physically linked on newly created indexes, which increases the probability that disk pages will be read continuously, thereby increasing the IO efficiency of the entire operation. See the following example:
      #1. At this point, approximately 60,000 pieces of data have been inserted into the table. The following SQL statement queries the disk space occupied by the index.      
 


    postgres=# SELECT relname, pg_relation_size(oid)/1024 || 'K' AS size FROM pg_class WHERE relkind='i' AND relname = 'testtable_idx';
        relname     | size
    ----------------+------
     testtable_idx | 1240K
    (1 row)
    #2. Delete most of the data in the data table.
    postgres=# DELETE FROM testtable WHERE i > 20000;
    DELETE 50006
    #3. Analyze one of the tables for later SQL The statement continues to view the space occupied by the index.
    postgres=# ANALYZE testtable;
    ANALYZE
    #4. As you can see from this query result, the index does not take up less space, but is exactly the same as before.
    postgres=# SELECT pg_relation_size('testtable_idx')/1024 || 'K' AS size;
     size
    ------
     1240K
    (1 row)
    #5. Rebuild the index.
    postgres=# REINDEX INDEX testtable_idx;
    REINDEX
    #6. Looking at the actual space taken up by the reconstructed index, you can see from the results that the size of the index has been reduced.
    postgres=# SELECT pg_relation_size('testtable_idx')/1024 || 'K' AS size;
     size
    ------
     368K
    (1 row)
    #7. The last thing to remember is that it is important to analyze the data table after index reconstruction.
    postgres=# ANALYZE testtable;
    ANALYZE

5. Observe disk usage:

1. View the number of disk pages consumed by the data table.
 


    #relpages Can only be VACUUM , ANALYZE And a few DDL Command updates, such as CREATE INDEX . The length of a page is usually 8K Bytes.
    postgres=# SELECT relfilenode, relpages FROM pg_class WHERE relname = 'testtable';
     relfilenode | relpages
    -------------+----------
           16412 |       79
    (1 row)
   

View the index name of the specified data table and the number of disk pages occupied by the index.
 

    postgres=# SELECT c2.relname, c2.relpages FROM pg_class c, pg_class c2, pg_index i
        WHERE c.relname = 'testtable' AND c.oid = i.indrelid AND c2.oid = i.indexrelid
        ORDER BY c2.relname;
        relname    | relpages
    ---------------+----------
     testtable_idx |       46
    (1 row)
 


Related articles: