Examples of recursive queries for PostgreSQL tree structures

  • 2020-05-06 11:55:48
  • OfStack

background

For hierarchies dealing with depth of uncertainty, such as organizations, a common design is to keep ID and Parent_ID in a single table, and to construct a tree by self-joining. This approach is friendly to the process of writing data, but the query process becomes relatively complex. Without introducing the MPTT model, a recursive algorithm must be used to query a node and its children.

Oracle provides connect by extension syntax, simple and easy to use. But the other RDBMS are not so personal (or I don't know). I recently used PostgreSQL in my project to query tree data.

constructs sample data


drop table if exists demo.tree_data;
create table demo.tree_data (
 id integer,
 code text,
 pid integer,
 sort integer
);

insert into demo.tree_data values(1, ' China ', null, 1);
insert into demo.tree_data values(2, ' sichuan ', 1, 1);
insert into demo.tree_data values(3, ' yunnan ', 1, 2);
insert into demo.tree_data values(4, ' chengdu ', 2, 1);
insert into demo.tree_data values(5, ' mianyang ', 2, 2);	
insert into demo.tree_data values(6, ' Wuhou district ', 4, 1);
insert into demo.tree_data values(7, ' kunming ', 3, 1);	

connectby function

If you have the tablefunc extension installed, you can use the PG version of the connectby function. This isn't as powerful as Oracle, but it meets the basic requirements.


-- API  The following 
connectby(text relname, 			--  The name of the table 
  text keyid_fld, 			-- id field 
  text parent_keyid_fld		--  The father id field 	
  [, text orderby_fld ], 	--  Sort field 
  text start_with, 			--  Starting line id value 
  int max_depth				--  Depth of the tree, 0 Said the infinite 
  [, text branch_delim ])	--  Path separator 

--  The basic usage is as follows and must be passed AS The clause defines the returned field name and type 
select * 
	from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
	as (id int, pid int, lvl int, branch text, sort int);
	
--  The query results 
id | pid | lvl | branch | sort
----+-----+-----+---------+------
 1 | | 0 | 1 | 1
 2 | 1 | 1 | 1~2 | 2
 4 | 2 | 2 | 1~2~4 | 3
 6 | 4 | 3 | 1~2~4~6 | 4
 5 | 2 | 2 | 1~2~5 | 5
 3 | 1 | 1 | 1~3 | 6
 7 | 3 | 2 | 1~3~7 | 7
(7 rows)

--  Using only basic usage, you can only query out id Relevant information if you want to query code And so on other fields that need to go through extra join Operation to implement. 
select 
	t.id, n.code, t.pid, p.code as pcode, lvl, branch
from (
	select * from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
		as (id int, pid int, lvl int, branch text, sort int)
) as t
	left join demo.tree_data as n on (t.id = n.id)
	left join demo.tree_data as p on (t.pid = p.id)
order by t.sort ;	

 id | code | pid | pcode | lvl | branch
----+--------+-----+-------+-----+---------
 1 |  China  | | | 0 | 1
 2 |  sichuan  | 1 |  China  | 1 | 1~2
 4 |  chengdu  | 2 |  sichuan  | 2 | 1~2~4
 6 |  Wuhou district  | 4 |  chengdu  | 3 | 1~2~4~6
 5 |  mianyang  | 2 |  sichuan  | 2 | 1~2~5
 3 |  yunnan  | 1 |  China  | 1 | 1~3
 7 |  kunming  | 3 |  yunnan  | 2 | 1~3~7
(7 rows)

PS: although the code of a node can be queried through join, the branch part cannot be directly converted to the corresponding code, which is still not very convenient to use.

CTE syntax

Using CTE syntax, the tree data is recursively queried by with recursive. This approach is not as straightforward as connectby, but it is more flexible and displays better.


-- 
with recursive cte as
(
 --  First, the query root node  
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
 union all
 --  through cte Recursive query root The immediate child of a node  
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
)
select
 id,code, pid, pcode, branch, 
 --  The depth of the tree is calculated by calculating the number of separators 
 (length(branch)-length(replace(branch, '~', ''))) as lvl
from cte;

-- 
 id | code | pid | pcode | branch  | lvl
----+--------+-----+-------+-----------------------+-----
 1 |  China  | | |  China    | 0
 2 |  sichuan  | 1 |  China  |  China ~ sichuan   | 1
 3 |  yunnan  | 1 |  China  |  China ~ yunnan   | 1
 4 |  chengdu  | 2 |  sichuan  |  China ~ sichuan ~ chengdu  | 2
 5 |  mianyang  | 2 |  sichuan  |  China ~ sichuan ~ mianyang  | 2
 7 |  kunming  | 3 |  yunnan  |  China ~ yunnan ~ kunming  | 2
 6 |  Wuhou district  | 4 |  chengdu  |  China ~ sichuan ~ chengdu ~ Wuhou district  | 3
(7 rows)

execution process description

As you can see from the above example, the WITH RECURSIVE statement contains two parts

Es115en-recursive term (non-recursive part), union all, recursive term (recursive part), union all,

The steps are as follows:

Execute non-recursive term. (if union is used instead of union all, the result will be deduplicated.) the result is a reference to result in recursive term, and this part of the result is put into temporary working table, repeat the following steps until working table is empty: Replace the recursive self-reference with the content of working table, perform recursive term, (remove duplicate data if union is used instead of union all), and replace working table with the result (if union is used instead of union union, the result of de-duplication)

Taking query above as an example, let's look at the specific process

Perform non - recursive query


-- step 1  perform 
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
 
--  The result set and working table for 
 id | code | pid | pcode | branch
----+------+-----+-------+--------
 1 |  China  | | |  China 

Perform recursive query


-- step 2  Execute the recursion, at which point the self - reference cte The data in step 1 The results of the 
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
 
 --  The result set and working table for 
 id | code | pid | pcode | branch 
----+--------+-----+-------+---------------------
 2 |  sichuan  | 1 |  China  |  China ~ sichuan   
 3 |  yunnan  | 1 |  China  |  China ~ yunnan   

3. Continue to execute recursive query until the result set and working table are empty

4. End the recursion, and combine the result set of the first three steps to get the final WITH RECURSIVE result set.

Strictly speaking, the implementation is iterative rather than recursive, but the RECURSIVE keyword was determined by the SQL standards committee, so PostgreSQL also USES RECURSIVE.

Summary


Related articles: