Postgresql backup and incremental recovery solution

  • 2020-05-06 11:56:21
  • OfStack

Preface to

The most recent working database has been Postgresql, an open source database that any individual can use for commercial purposes. When using Postgresql, the most obvious feeling to me was that the database was really well done. Although the installation package of the database was really small, the performance and convenience of operation were not inferior to other large commercial databases. In addition, it was great to operate the database directly from the command line interface. When using the database, we as the database administrator of a small company have a job that is impossible to avoid, that is the data backup and restore problems. Although PostgreSQL has a lot of aspects, it is a pity that it does not support incremental backup when it comes to database backup. Still, it is a good database software in general.

Previously, we deployed Postgresql's master-slave asynchronous flow replication environment in the Postgresql master-slave asynchronous flow replication scenario section. The purpose of master-slave replication is to backup data, achieve high availability of data and fault-tolerant rows. The following is a brief introduction to the backup and recovery scenario when we operate and maintain Postgresql database.

incremental backup

Any changes to the data file that PostgreSQL makes to the write operation are written to the WAL log (pre-write log) before the physical changes are made to the data file. When the database server fails to restart, PostgreSQL starts by reading the WAL logs to restore the data files. So, in theory, if we have a base backup (also known as full backup) of the database, coupled with the WAL logs, we can restore the database to any point in time.

The above is important because the incremental backup of our scenario is essentially a redo recovery with the base backup + incremental WAL log.

incremental backup set

To demonstrate the functionality, we created the

administrative directory on the pghost1 server, based on the Postgresql master-slave asynchronous flow replication solution section

Switch to postgres user under


mkdir -p /data/pg10/backups
mkdir -p /data/pg10/archive_wals

The backups directory can be used to store the underlying

backup

The archive_wals directory is naturally used to store the

archive

Next we modify the

Settings for our postgresql.conf file

wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'

The default value for the archive_command parameter is an empty string, which can be an shell command or a complex shell script.

In the shell command or script of archive_command, you can use %p to represent the file name of the WAL file to be archived with full path information, and %f to represent the file name of the WAL file without path information.

Changes to wal_level and archive_mode require a database restart to take effect. Changes to archive_command do not require a restart, just reload, e.g.


postgres=# SELECT pg_reload_conf();

postgres=# show archive_command ; 

creates a base backup

We use the pg_basebackup command introduced before to create the basic backup. The basic backup is very important, and our data recovery cannot be done without it. It is suggested that we periodically generate our basic backup according to relevant business policies.


$ pg_basebackup -Ft -Pv -Xf -z -Z5 -p 25432 -D /data/pg10/backups/

In this way, we successfully generated

backup of our base data

sets the restore point

In general, we need to create a restore point based on the important event, and restore to the state before the event by basic backup and archiving.

The system function that creates the restore point is pg_create_restore_point, which is defined as


postgres=# SELECT pg_create_restore_point('domac-201810141800');

returns to the specified restore point

Next, we use an example to restore our data to

at the restore point we set

First, we create a test sheet:


CREATE TABLE test_restore(
 id SERIAL PRIMARY KEY,
 ival INT NOT NULL DEFAULT 0,
 description TEXT,
 created_time TIMESTAMPTZ NOT NULL DEFAULT now()
);

Initialize some test data as the underlying data, as shown below:


postgres=# INSERT INTO test_restore (ival) VALUES (1);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (2);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (3);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (4);
INSERT 0 1

postgres=# select * from test_restore;
 id | ival | description |   created_time
----+------+-------------+-------------------------------
 1 | 1 |    | 2018-10-14 11:13:41.57154+00
 2 | 2 |    | 2018-10-14 11:13:44.250221+00
 3 | 3 |    | 2018-10-14 11:13:46.311291+00
 4 | 4 |    | 2018-10-14 11:13:48.820479+00
(4 rows)

And create a base backup as described above. If it is a test, it is important to note that since the WAL file is archived only when it is full of 16MB, there may be very few writes during the test phase, and you can manually switch to WAL after performing the underlying backup. For example:


postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/1D01B858
(1 row)

Or by setting the archive_timeout parameter, you can forcibly switch to the new WAL segment when the timeout threshold is reached.

Next, create a restore point, as shown below:


postgres=# select pg_create_restore_point('domac-1014');
 pg_create_restore_point
-------------------------
 0/1E0001A8
(1 row)

Next we make some changes to the data, we delete all the data for test_restore:


postgres=# delete from test_restore;
DELETE 4

Following is an experiment to restore to the point named "domac-1014", as shown below:

Stop database


wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'
0

Remove the old data directory


wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'
1

Modify recovery.conf to modify the following configuration information:


wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'
2

Then start the database into a restore state and observe the log as shown below:


bash-4.2$ pg_ctl start -D /data/pg10/db
waiting for server to start....2018-10-14 11:26:56.949 UTC [8397] LOG: listening on IPv4 address "0.0.0.0", port 25432
2018-10-14 11:26:56.949 UTC [8397] LOG: listening on IPv6 address "::", port 25432
2018-10-14 11:26:56.952 UTC [8397] LOG: listening on Unix socket "/tmp/.s.PGSQL.25432"
2018-10-14 11:26:56.968 UTC [8398] LOG: database system was interrupted; last known up at 2018-10-14 09:26:59 UTC
2018-10-14 11:26:57.049 UTC [8398] LOG: starting point-in-time recovery to "domac-1014"
/data/pg10/archive_wals/00000002.history.lz4: No such file or directory
2018-10-14 11:26:57.052 UTC [8398] LOG: restored log file "00000002.history" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.077 UTC [8398] LOG: restored log file "000000020000000000000016" from archive
2018-10-14 11:26:57.191 UTC [8398] LOG: redo starts at 0/16000060
2018-10-14 11:26:57.193 UTC [8398] LOG: consistent recovery state reached at 0/16000130
2018-10-14 11:26:57.193 UTC [8397] LOG: database system is ready to accept read only connections
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.217 UTC [8398] LOG: restored log file "000000020000000000000017" from archive
 done
server started
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.384 UTC [8398] LOG: restored log file "000000020000000000000018" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.513 UTC [8398] LOG: restored log file "000000020000000000000019" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.699 UTC [8398] LOG: restored log file "00000002000000000000001A" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.805 UTC [8398] LOG: restored log file "00000002000000000000001B" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.982 UTC [8398] LOG: restored log file "00000002000000000000001C" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.116 UTC [8398] LOG: restored log file "00000002000000000000001D" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.310 UTC [8398] LOG: restored log file "00000002000000000000001E" from archive
2018-10-14 11:26:58.379 UTC [8398] LOG: recovery stopping at restore point "domac-1014", time 2018-10-14 11:17:20.680941+00
2018-10-14 11:26:58.379 UTC [8398] LOG: recovery has paused
2018-10-14 11:26:58.379 UTC [8398] HINT: Execute pg_wal_replay_resume() to continue.

After rebooting, we query the test_restore table to see if the data recovers properly:


wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'
4

You can see that the data has been restored to the specified restore point: domac-1014.

At this point, recovery.conf can be removed to avoid the next data restart, data will be restored to the restore point

summary

Backup and recovery is a very important work in database management. In daily operation and maintenance, we need to carry out backup of relevant policies according to the needs, and carry out recovery tests periodically to ensure the security of data.


Related articles: