PostgreSQL tutorial of iii: table inheritance and partition table details
- 2020-05-06 11:54:27
- OfStack
1. Inheritance of tables:
The concept of is somewhat unfamiliar to many developers who are already familiar with other database programming, but its implementation and design principles are straightforward, so let's start with a simple example.
1. First inheritance table:
CREATE TABLE cities ( -- The parent table
name text,
population float,
altitude int
);
CREATE TABLE capitals ( -- Child table
state char(2)
) INHERITS (cities);
The capitals table inherits all the attributes of the cities table. In PostgreSQL, a table can inherit attributes from zero or more other tables, and a query can refer to either all rows in the parent table or all rows in the parent table plus all rows in its child tables, where the latter is the default behavior.
MyTest=# INSERT INTO cities values('Las Vegas', 1.53, 2174); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO cities values('Mariposa',3.30,1953); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO capitals values('Madison',4.34,845,'WI');-- Insert the child table
INSERT 0 1
MyTest=# SELECT name, altitude FROM cities WHERE altitude > 500; -- Both parent and child tables are fetched.
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
MyTest=# SELECT name, altitude FROM capitals WHERE altitude > 500; -- Only the data of the child table is retrieved.
name | altitude
---------+----------
Madison | 845
(1 row)
If you want to extract data only from the parent table, you need to include the ONLY keyword in SQL, such as
MyTest=# SELECT name,altitude FROM ONLY cities WHERE altitude > 500;
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)
The "ONLY" keyword before cities in the example above indicates that the query should only look up cities and not include tables with inheritance levels lower than cities. Many of the commands we've already discussed --SELECT, UPDATE, and DELETE-- support this "ONLY" symbol.
When performs the deletion of the whole table data, if truncate parent table directly, then the data of the parent table and all of its children table are deleted; if truncate child table only, the data of the parent table will not change, but the data in the child table is cleared.
MyTest=# TRUNCATE TABLE cities; -- Data from both parent and child tables is deleted.
TRUNCATE TABLE
MyTest=# SELECT * FROM capitals;
name | population | altitude | state
------+------------+----------+-------
(0 rows)
2. Determine the source of the data:
Sometimes you might wonder which table a particular record comes from. In each table we have a system implication field, tableoid, which tells you the source of the table:
MyTest=# SELECT tableoid, name, altitude FROM cities WHERE altitude > 500;
tableoid | name | altitude
----------+-----------+----------
16532 | Las Vegas | 2174
16532 | Mariposa | 1953
16538 | Madison | 845
(3 rows)
The results above only give tableoid, and we still can't see the actual table name from this value alone. To do this, we need to associate with the system table pg_class to extract the actual table name from the table by the tableoid field, see the following query:
MyTest=# SELECT p.relname, c.name, c.altitude FROM cities c,pg_class p WHERE c.altitude > 500 and c.tableoid = p.oid;
relname | name | altitude
----------+-----------+----------
cities | Las Vegas | 2174
cities | Mariposa | 1953
capitals | Madison | 845
(3 rows)
3. Notes for data insertion:
inheritance does not automatically populate other tables in the inheritance level from INSERT or COPY. In our example, the following INSERT statement will not succeed:
INSERT INTO cities (name, population, altitude, state) VALUES ('New York', NULL, NULL, 'NY');
we might want the data to be passed into the capitals table, but that doesn't happen: INSERT always inserts the explicitly declared table.
4. Multitable inheritance:
A table can inherit from more than one parent table, in which case it has the sum of the parent table's fields. Any fields defined in the child table are also added. If the same field name appears in more than one parent table, or in both parent and child table definitions, the fields are "merged" so that there is only one such field in the child table. To merge, the fields must be of the same data type, or an error will be thrown. The merged field will have all the constraints of the field it inherits.
CREATE TABLE parent1 (FirstCol integer);
CREATE TABLE parent2 (FirstCol integer, SecondCol varchar(20));
CREATE TABLE parent3 (FirstCol varchar(200));
-- Child table child1 Will inherit from both parent1 and parent2 Table, which is contained in both parent tables integer The type of FirstCol Field, therefore child1 Can be created successfully.
CREATE TABLE child1 (MyCol timestamp) INHERITS (parent1,parent2);
-- Child table child2 It will not be created successfully because it is contained in both of its parent tables FirstCol Fields, but they are of different types.
CREATE TABLE child2 (MyCol timestamp) INHERITS (parent1,parent3);
-- Child table child3 Again, it will not succeed because it and its parent table are both included FirstCol Fields, but they are of different types.
CREATE TABLE child3 (FirstCol varchar(20)) INHERITS(parent1);
5. Inheritance and permissions:
table access is not automatically inherited. Therefore, a user attempting to access the parent table must also have access to all of its child tables, or extract data only from the parent table using the ONLY keyword. When adding a new child table to an existing inheritance hierarchy, be careful to give it all permissions.
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints are applied to a single table and not to their inherited children. This is true for both the referenced table and the referenced table, so in the above example, if we declare cities.name to be UNIQUE or an PRIMARY KEY, it will not prevent the capitals table from having rows of cities with duplicate names. And these duplicate rows are shown by default when querying the cities table. In fact, by default capitals will have no unique constraint at all, so it may contain multiple rows with the same name. You should add a unique constraint to capitals, but doing so will not avoid duplication with cities. Similarly, if we declare cities.name REFERENCES some other table, this constraint does not automatically broadcast to capitals. In this case, you can do this by manually adding the same REFERENCES constraint to capitals.
Ii. Partition table:
1. Overview of partition tables:
The partition means to divide a large logical table into physical chunks, and partitions provide several benefits:
1). The performance of some types of queries can be greatly improved.
2). The performance of updates can also be improved because each index in a table is smaller than the index on the entire dataset. If the indexes are not all in memory, then both reads and writes on the indexes result in more disk access.
3). Bulk deletion can be achieved by simply deleting a partition.
4). Data that is rarely used can be moved to a cheaper, slower storage medium.
assumes that the current database does not support partitioned tables, and our application needs to process a large amount of data. For this application scenario, we have to manually break the large table into several small tables according to certain rules, so that each small table contains different data intervals. In this way, we have to calculate the small table that the instruction needs to operate on before data is inserted, updated, deleted, and queried. For some queries, because the query interval may span multiple small tables, we have to perform union operations on the query results from multiple small tables to merge the data from multiple tables and return a result set to the client. As you can see, if we are using a database that does not support partitioned tables, we need to do a lot of extra programming to make up for this in a scenario that is appropriate for its application. It should be noted, however, that while the functionality is passable, the performance is not comparable to that of partitioned tables.
currently supports two main types of partitioning:
Range partitioning: the table is partitioned into "ranges" by one or more key fields, and values that do not overlap between these ranges are distributed in different partitions. For example, we can partition specific business objects by data scope, or by identifier scope.
List partitioning: tables are implemented by explicitly listing the key values that should appear in each partition.
2
1). Create the "main table" from which all partitions inherit.
CREATE TABLE measurement ( -- The main table
city_id int NOT NULL,
logdate date NOT NULL,
peaktemp int,
);
2). Create several "child" tables, each inheriting from the main table. Typically, these "child" tables will not add any more fields. We will call the subtables partitions, although they are just plain PostgreSQL tables.
CREATE TABLE measurement_yy04mm02 ( ) INHERITS (measurement);
CREATE TABLE measurement_yy04mm03 ( ) INHERITS (measurement);
...
CREATE TABLE measurement_yy05mm11 ( ) INHERITS (measurement);
CREATE TABLE measurement_yy05mm12 ( ) INHERITS (measurement);
CREATE TABLE measurement_yy06mm01 ( ) INHERITS (measurement);
The subtables created above are all scoped in the form of months and years, and data from different months will be assigned to different subtables. Such an implementation would be very convenient and efficient for clearing partition data by directly executing the DROP TABLE statement to delete the corresponding subtable, and then considering whether to rebuild the subtable (partition) based on the actual application. PostgreSQL also provides a more convenient way to manage subtables than direct DROP subtables:
ALTER TABLE measurement_yy06mm01 NO INHERIT measurement;
In contrast to direct DROP, only separates the child table from the original main table, while the data stored in the child table can still be accessed because the table has been restored to a normal data table. In this way, for DBA of the database, necessary maintenance operations such as data cleaning and archiving can be performed on the table at this time. After many routine operations, you can consider whether to delete the table directly (DROP TABLE) or clear the data of the table first (TRUNCATE TABLE), and then let the table inherit the main table again, such as
ALTER TABLE measurement_yy06mm01 INHERIT measurement;
3). Add a constraint to the partition table to define the allowable keys for each partition. It is also important to note that the constraints you define ensure that you do not have the same key values in different partitions. Therefore, we need to change the definition of the "child" table above to the following form:
CREATE TABLE measurement_yy04mm02 (
CHECK ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01')
) INHERITS (measurement);
CREATE TABLE measurement_yy04mm03 (
CHECK (logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01')
) INHERITS (measurement);
...
CREATE TABLE measurement_yy05mm11 (
CHECK (logdate >= DATE '2005-11-01' AND logdate < DATE '2005-12-01')
) INHERITS (measurement);
CREATE TABLE measurement_yy05mm12 (
CHECK (logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01')
) INHERITS (measurement);
CREATE TABLE measurement_yy06mm01 (
CHECK (logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01')
) INHERITS (measurement);
4). Create indexes based on key values whenever possible. If necessary, we can also create indexes for other fields in the child table.
CREATE INDEX measurement_yy04mm02_logdate ON measurement_yy04mm02 (logdate);
CREATE INDEX measurement_yy04mm03_logdate ON measurement_yy04mm03 (logdate);
...
CREATE INDEX measurement_yy05mm11_logdate ON measurement_yy05mm11 (logdate);
CREATE INDEX measurement_yy05mm12_logdate ON measurement_yy05mm12 (logdate);
CREATE INDEX measurement_yy06mm01_logdate ON measurement_yy06mm01 (logdate);
5). Define a rule or trigger to redirect changes to the main table to the appropriate partitioned table.
If the data only enters the latest partition, we can set a very simple rule to insert the data. We must redefine this rule every month to change the name of the child table redirected into so that it always points to the current partition.
CREATE OR REPLACE RULE measurement_current_partition AS
ON INSERT TO measurement
DO INSTEAD
INSERT INTO measurement_yy06mm01 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp);
NEW is a keyword that represents a collection of new data fields. Here you can get each field in the collection by the dot (.) operator.
we might want to insert data and want the server to automatically figure out which partition to insert data into. We can do this with a more complex set of rules like the following.
CREATE RULE measurement_insert_yy04mm02 AS
ON INSERT TO measurement WHERE (logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01')
DO INSTEAD
INSERT INTO measurement_yy04mm02 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp);
...
CREATE RULE measurement_insert_yy05mm12 AS
ON INSERT TO measurement WHERE (logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01')
DO INSTEAD
INSERT INTO measurement_yy05mm12 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp);
CREATE RULE measurement_insert_yy06mm01 AS
ON INSERT TO measurement WHERE (logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01')
DO INSTEAD
INSERT INTO measurement_yy06mm01 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp);
Notice that the WHERE clause in each rule matches the CHECK constraint of its partition.
As you can see, a complex partitioning scheme might require quite a bit of DDL. In the above example we need to create a new partition once a month, so it makes sense to write a script that automatically generates the required DDL. In addition, it is not difficult to infer that partitioned tables are somewhat resistant to bulk inserts of new data, as is the case with Oracle.
In addition to redirecting the data from the main table to each subtable in the Rule way described above, we can also do this by means of triggers, which may lead to better insertion efficiency, especially for non-bulk inserts, than the Rule based redirection method. For bulk inserts, however, Rule works better than the trigger approach because the extra overhead is based on tables, not rows. Another thing to note is that the copy operation will ignore Rules. If we want to insert data through the COPY method, you can only directly copy the data to the correct subtable, not the main table. This limitation does not cause any problems for triggers. Another problem with Rule based redirection is that PostgreSQL does not report an error when the inserted data is not constrained by any child table, but instead keeps the data directly in the main table.
6). Add a new partition:
here are two ways to add a new partition. The first is simple and intuitive. We simply create a new subtable and define a new check constraint for it, such as
CREATE TABLE measurement_y2008m02 (
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' )
) INHERITS (measurement);
The second method, , is relatively cumbersome to create, but more flexible and practical. See the following four steps:
/* Create a separate data table (measurement_y2008m02) , the table is created as the primary table to be (measurement) Is the template and contains the default values for the template table (DEFAULTS) And consistency constraints (CONSTRAINTS) . */
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
/* Create a check constraint for this table that you will need to use in the future as a child table. */
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK (logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01');
/* Import data into the table. The following is just an example of a way to import data. After importing data, if possible, you can do further data processing, such as data conversion, filtering, and so on. */
\copy measurement_y2008m02 from 'measurement_y2008m02'
/* Let the table inherit from the main table when appropriate, or when needed. */
ALTER TABLE measurement_y2008m02 INHERIT measurement;
7). Make sure that the configuration parameter constraint_exclusion in postgresql.conf is on. Without this parameter, the query will not be optimized as needed. What we need to do here is make sure that the option is not commented out in the configuration file.
MyTest=# INSERT INTO cities values('Las Vegas', 1.53, 2174); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO cities values('Mariposa',3.30,1953); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO capitals values('Madison',4.34,845,'WI');-- Insert the child table
INSERT 0 1
MyTest=# SELECT name, altitude FROM cities WHERE altitude > 500; -- Both parent and child tables are fetched.
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
MyTest=# SELECT name, altitude FROM capitals WHERE altitude > 500; -- Only the data of the child table is retrieved.
name | altitude
---------+----------
Madison | 845
(1 row)
8
Partition and constraint exclusion:
constraint exclusion (Constraint exclusion) is a query optimization technique that improves the performance of table partitions defined in the above method. For example:
MyTest=# INSERT INTO cities values('Las Vegas', 1.53, 2174); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO cities values('Mariposa',3.30,1953); -- Insert the parent table
INSERT 0 1
MyTest=# INSERT INTO capitals values('Madison',4.34,845,'WI');-- Insert the child table
INSERT 0 1
MyTest=# SELECT name, altitude FROM cities WHERE altitude > 500; -- Both parent and child tables are fetched.
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
MyTest=# SELECT name, altitude FROM capitals WHERE altitude > 500; -- Only the data of the child table is retrieved.
name | altitude
---------+----------
Madison | 845
(1 row)
9
If there is no constraint exclusion, the above query scans every partition in the measurement table. With constraint exclusion turned on, the planner checks the constraints on each partition and then proves to the view that the partition does not need to be scanned because it cannot contain any rows of data that meet the WHERE clause condition. If the planner can prove this, it excludes the partition from the query plan.
you can use the EXPLAIN command to show the difference between a plan with constraint_exclusion on and off. The typical default plan for a table set in the above method is
MyTest=# SELECT name,altitude FROM ONLY cities WHERE altitude > 500;
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)
0
As you can see from the query plan above, scans all partitions. Now let's take a look at the query plan with constraint exclusion turned on:
SET constraint_exclusion = on;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=63.47..63.48 rows=1 width=0)
-> Append (cost=0.00..60.75 rows=1086 width=0)
-> Seq Scan on measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate >= '2006-01-01'::date)
-> Seq Scan on measurement_yy06mm01 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate >= '2006-01-01'::date)
Note that constraint exclusion is driven only by the CHECK constraint, not by the index.
current version of the configuration in the PostgreSQL partition, is the default value of this value is a kind of behavior between on and off, namely the planner will only applies constraint ruled out based on the partition table query, while on Settings will be for all the queries are constraint rule out, so for common data table, will also have to bear by the mechanism of extra overhead.
The constraint excludes the following considerations when used:
1). Constraint exclusion only takes effect if the WHERE clause of the query contains the constraint. A parameterized query is not optimized because at run time the planner does not know which partition the parameter will select. So functions like CURRENT_DATE must be avoided. Nor is it optimized to concatenate partition key values with fields in another table.
2). Avoid cross-data type comparisons within the CHECK constraint, as the current planner will not be able to prove such conditions false. For example, the following constraint is available if x is an integer field, but not if x is an bigint:
CHECK (x = 1)
For the field, we must use a constraint similar to the following:
CHECK (x = 1::bigint)
this problem is not limited to bigint data types only; it can occur where the default data type of any constraint does not match the data type of the field it is comparing to. Cross-data type comparisons in submitted queries are usually OK, but not in CHECK conditions.
3). The UPDATE and DELETE commands on the main table do not perform constraint exclusion.
4). When the planner performs constraint exclusion, all constraints for all partitions on the main table are checked, so a large number of partitions significantly increase the query planning time.
5). When you execute the ANALYZE statement, execute the command for each partition, not just for the main table.