Syntax and use of window functions in PostgreSQL database
- 2020-05-06 11:56:07
- OfStack
What is a window function?
A window function performs a calculation on a series of table rows that are somehow associated with the current row. This is comparable to what an aggregate function does. But a window function does not aggregate multiple rows into a single output line, unlike a typical non-window aggregation function. Instead, rows keep their individual identifiers. Behind these phenomena, window functions can access more than just the current state of the query results.
Access to multi-line records related to the current record; It doesn't aggregate multiple rows into one row, as opposed to an aggregate function;
window function syntax
The window function follows an OVER clause, which determines which rows in the query are separated and processed by the window function.
You can include partitioning (PARTITION BY) and sorting (ORDER BY) instructions, both of which are optional.
window_func() OVER([PARTITION BY field] [ORDER BY field])
If the PARTITION BY and ORDER BY instructions are not specified, they are equivalent to the aggregate function, which evaluates the entire data.
The PARTITION BY clause groups the rows of the query into partitions, which the window function handles independently. PARTITION BY works like a query-level GROUP BY clause, except that its expression is always just an expression and cannot be the name or number of the output column. Without PARTITION BY, all rows generated by the query are treated as a single partition.
The ORDER BY clause determines the order of rows in a partition processed by the window function. It works like an ORDER BY clause for a query level, but again cannot use the name or number of the output column. If there is no ORDER BY, the rows are processed in an unspecified order.
The aggregate function in PostgreSQL can also use
as a window function
In addition to these built-in window functions, any built-in or user-defined general or statistical aggregation (that is, ordered set or hypothetical set aggregation) can be used as a window function. Only when the call is followed by the OVER clause will the aggregation function be used as a window function; Otherwise they are clustered as non-windows and return a single row for the rest of the collection.
window function example
The staff salary (emp_salary) table is structured as follows:
SELECT emp_no, dep_name, salary
FROM public.emp_salary
order by dep_name, emp_no;
emp_id | dep_name | salary |
---|---|---|
7 | develop | 4200 |
8 | develop | 6000 |
9 | develop | 4500 |
10 | develop | 5200 |
11 | develop | 5200 |
2 | personnel | 3900 |
5 | personnel | 3500 |
1 | sales | 5000 |
3 | sales | 4800 |
4 | sales | 4800 |
If you want to compare the average salary per employee with that of their department, you need this result:
emp_id | dep_name | salary | avg |
---|---|---|---|
7 | develop | 4200 | 5020 |
8 | develop | 6000 | 5020 |
9 | develop | 4500 | 5020 |
10 | develop | 5200 | 5020 |
11 | develop | 5200 | 5020 |
2 | personnel | 3900 | 3700 |
5 | personnel | 3500 | 3700 |
1 | sales | 5000 | 4866.66666666667 |
3 | sales | 4800 | 4866.66666666667 |
4 | sales | 4800 | 4866.66666666667 |
If you do not use the window function to query, it is more complex, of course, you can do it, as follows:
SELECT e0.emp_no, e0.dep_name, e0.salary, e2.avg_salary
FROM public.emp_salary e0
join (
select e1.dep_name, avg(e1.salary) as avg_salary
from public.emp_salary e1
group by e1.dep_name
) e2 on e2.dep_name = e0.dep_name
order by e0.dep_name, e0.emp_no;
This is easy to do if you query using the window function, sql statement is as follows:
SELECT emp_no, dep_name, salary,
avg(salary) over(partition by dep_name)
FROM public.emp_salary
order by dep_name, emp_no;
However, if you want to query the change of the average salary of each department as the number of employees increases, as shown in the following table, it is difficult to do without the query of window function.
emp_id | dep_name | salary | avg |
---|---|---|---|
7 | develop | 4200 | 4200 |
8 | develop | 6000 | 5100 |
9 | develop | 4500 | 4900 |
10 | develop | 5200 | 4975 |
11 | develop | 5200 | 5020 |
2 | personnel | 3900 | 3900 |
5 | personnel | 3500 | 3700 |
1 | sales | 5000 | 5000 |
3 | sales | 4800 | 4900 |
4 | sales | 4800 | 4866.66666666667 |
If you use the window function, you can still do this easily. Here is the statement:
SELECT emp_no, dep_name, salary,
avg(salary) over(partition by dep_name order by emp_no)
FROM public.emp_salary
order by dep_name, emp_no;
As you can see, the window function has a great advantage when you need to compute the relevant rows in the query result.
summary