开窗函数-优雅的计算每个雇员所在部门薪资最高的员工姓名

题目

有雇员表t_employees,包含员工姓名、部门、薪资和年龄,请计算出每个雇员所在部门薪资最高的员工姓名

样例数据

+--------+--------------+---------+------+
|  name  |     dept     | salary  | age  |
+--------+--------------+---------+------+
| Lisa   | Sales        | 10000   | 35   |
| Evan   | Sales        | 32000   | 38   |
| Fred   | Engineering  | 21000   | 28   |
| Alex   | Sales        | 30000   | 33   |
| Tom    | Engineering  | 23000   | 33   |
| Jane   | Marketing    | 29000   | 28   |
| Helen  | Marketing    | 29000   | 40   |
| Jeff   | Marketing    | 35000   | 38   |
| Paul   | Engineering  | 29000   | 23   |
| Chloe  | Engineering  | 23000   | 25   |
+--------+--------------+---------+------+

期望结果

+--------+--------------+---------+------+----------------------+
|  name  |     dept     | salary  | age  | dept_maxsalary_name  |
+--------+--------------+---------+------+----------------------+
| Chloe  | Engineering  | 23000   | 25   | Paul                 |
| Paul   | Engineering  | 29000   | 23   | Paul                 |
| Tom    | Engineering  | 23000   | 33   | Paul                 |
| Fred   | Engineering  | 21000   | 28   | Paul                 |
| Jeff   | Marketing    | 35000   | 38   | Jeff                 |
| Helen  | Marketing    | 29000   | 40   | Jeff                 |
| Jane   | Marketing    | 29000   | 28   | Jeff                 |
| Alex   | Sales        | 30000   | 33   | Evan                 |
| Evan   | Sales        | 32000   | 38   | Evan                 |
| Lisa   | Sales        | 10000   | 35   | Evan                 |
+--------+--------------+---------+------+----------------------+

解法一

使用排序函数取出薪资最高纪录,然后筛选姓名,通过部门与原始雇员表进行关联计算出最终结果

执行SQL

select t1.name,
       t1.dept,
       t1.salary,
       t1.age,
       t2.name as dept_maxsalary_name
from t_employees t1
         join
     (
     --优先计算出每个部门薪水最高的员工姓名
     select name,
             dept,
             salary,
             age
      from (select name,
                   dept,
                   salary,
                   age,
                   row_number() over (partition by dept order by salary desc) as rn
            from t_employees) t
      where t.rn = 1) t2
     on t1.dept = t2.dept

解法二

直接使用max_by函数开窗,取出结果。

max_by(x, y) - 返回与 y 的最大值相关联的 x 值。(https://sparkfunctions.com/max_by)

执行SQL

select name,
       dept,
       salary,
       age,
       max_by(name,salary) over (partition by dept) as dept_maxsalary_name
from t_employees

为了方便查看对比,我们增加一列雇员所在部门薪资最高同事的薪资(与max函数进行对比) 执行SQL

select name,
       dept,
       salary,
       age,
       max(salary)over (partition by dept) as dept_maxsalary,
       max_by(name,salary) over (partition by dept) as dept_maxsalary_name
from t_employees

执行结果

+--------+--------------+---------+------+-----------------+----------------------+
|  name  |     dept     | salary  | age  | dept_maxsalary  | dept_maxsalary_name  |
+--------+--------------+---------+------+-----------------+----------------------+
| Fred   | Engineering  | 21000   | 28   | 29000           | Paul                 |
| Tom    | Engineering  | 23000   | 33   | 29000           | Paul                 |
| Paul   | Engineering  | 29000   | 23   | 29000           | Paul                 |
| Chloe  | Engineering  | 23000   | 25   | 29000           | Paul                 |
| Jane   | Marketing    | 29000   | 28   | 35000           | Jeff                 |
| Helen  | Marketing    | 29000   | 40   | 35000           | Jeff                 |
| Jeff   | Marketing    | 35000   | 38   | 35000           | Jeff                 |
| Lisa   | Sales        | 10000   | 35   | 32000           | Evan                 |
| Evan   | Sales        | 32000   | 38   | 32000           | Evan                 |
| Alex   | Sales        | 30000   | 33   | 32000           | Evan                 |
+--------+--------------+---------+------+-----------------+----------------------+