Syntax and use of window functions in PostgreSQL database

  • 2020-05-06 11:56:07
  • OfStack

What is a window function?

A window function performs a calculation on a series of table rows that are somehow associated with the current row. This is comparable to what an aggregate function does. But a window function does not aggregate multiple rows into a single output line, unlike a typical non-window aggregation function. Instead, rows keep their individual identifiers. Behind these phenomena, window functions can access more than just the current state of the query results.

Access to multi-line records related to the current record; It doesn't aggregate multiple rows into one row, as opposed to an aggregate function;

window function syntax

The window function follows an OVER clause, which determines which rows in the query are separated and processed by the window function.

You can include partitioning (PARTITION BY) and sorting (ORDER BY) instructions, both of which are optional.

window_func() OVER([PARTITION BY field] [ORDER BY field])

If the PARTITION BY and ORDER BY instructions are not specified, they are equivalent to the aggregate function, which evaluates the entire data.

The PARTITION BY clause groups the rows of the query into partitions, which the window function handles independently. PARTITION BY works like a query-level GROUP BY clause, except that its expression is always just an expression and cannot be the name or number of the output column. Without PARTITION BY, all rows generated by the query are treated as a single partition.

The ORDER BY clause determines the order of rows in a partition processed by the window function. It works like an ORDER BY clause for a query level, but again cannot use the name or number of the output column. If there is no ORDER BY, the rows are processed in an unspecified order.

The aggregate function in PostgreSQL can also use
as a window function

In addition to these built-in window functions, any built-in or user-defined general or statistical aggregation (that is, ordered set or hypothetical set aggregation) can be used as a window function. Only when the call is followed by the OVER clause will the aggregation function be used as a window function; Otherwise they are clustered as non-windows and return a single row for the rest of the collection.

window function example

The staff salary (emp_salary) table is structured as follows:


SELECT emp_no, dep_name, salary
FROM public.emp_salary
order by dep_name, emp_no;

emp_id dep_name salary
7 develop 4200
8 develop 6000
9 develop 4500
10 develop 5200
11 develop 5200
2 personnel 3900
5 personnel 3500
1 sales 5000
3 sales 4800
4 sales 4800

If you want to compare the average salary per employee with that of their department, you need this result:

emp_id dep_name salary avg
7 develop 4200 5020
8 develop 6000 5020
9 develop 4500 5020
10 develop 5200 5020
11 develop 5200 5020
2 personnel 3900 3700
5 personnel 3500 3700
1 sales 5000 4866.66666666667
3 sales 4800 4866.66666666667
4 sales 4800 4866.66666666667

If you do not use the window function to query, it is more complex, of course, you can do it, as follows:


SELECT e0.emp_no, e0.dep_name, e0.salary, e2.avg_salary
FROM public.emp_salary e0
join (
 select e1.dep_name, avg(e1.salary) as avg_salary
 from public.emp_salary e1
 group by e1.dep_name
) e2 on e2.dep_name = e0.dep_name
order by e0.dep_name, e0.emp_no;

This is easy to do if you query using the window function, sql statement is as follows:


SELECT emp_no, dep_name, salary,
  avg(salary) over(partition by dep_name)
FROM public.emp_salary
order by dep_name, emp_no;

However, if you want to query the change of the average salary of each department as the number of employees increases, as shown in the following table, it is difficult to do without the query of window function.

emp_id dep_name salary avg
7 develop 4200 4200
8 develop 6000 5100
9 develop 4500 4900
10 develop 5200 4975
11 develop 5200 5020
2 personnel 3900 3900
5 personnel 3500 3700
1 sales 5000 5000
3 sales 4800 4900
4 sales 4800 4866.66666666667

If you use the window function, you can still do this easily. Here is the statement:


SELECT emp_no, dep_name, salary,
  avg(salary) over(partition by dep_name order by emp_no)
FROM public.emp_salary
order by dep_name, emp_no;

As you can see, the window function has a great advantage when you need to compute the relevant rows in the query result.

summary


Related articles: