如何处理缺失值

日常工作遇到数据缺失的问题,为保证数据质量或者后续使用方便,我们需要对缺失数据进行处理,今天先聊一下字段值的缺失及填充方式。

字段缺失指在数据表中数据行存在(主键存在),但是某个字段在某一行或者某些行为null的情况。

涉及函数

coalesce()

round()

avg()

lag()/lead()

last_value()/first_value()

sum()

row_number()

1.1 使用默认值/规则补充

这里的默认值/规则指:可能是0,某个特定的值,或者根据本行其他字段可以计算的结果。

样例数据

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

在上面样例数据中,张三2025-07-03日的体重数据缺失,可以进行直接处理,填入一个默认值,例如-999,0 等给定默认值;

执行语句

select riqi,
       name,
       coalesce(weight, -999) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info1

结果

+-------------+-------+----------+---------+--------+----------------------+
|    riqi     | name  |  weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+----------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31    | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29    | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | -999.00  | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29    | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30    | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22    | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18    | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09    | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01    | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01    | 183.50  | 27.03  | 4                    |
+-------------+-------+----------+---------+--------+----------------------+

如此补充数据,能够解决空值问题,但是这些数据无实际意义,补数太过暴力。我们知道体重、身高和身高质量指数之间存在勾稽关系,如果有另外两个值,是可以计算出缺失值的。所以我们可以根据这个进行数据补充,计算公式为: BMI=体重÷身高的平方。

执行语句

select riqi,
       name,
       coalesce(weight, round(bmi*power(height/100,2),2)) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info1

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.3    | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.2 使用平均值(中位数、众数)补充

指使用该列的平均值进行补充,当然也可以是组内平均值,或者使用中位数、众数等各类统计值进行补充。

样例数据

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.57   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.2.1 平均值

我们可以取整列的平均值进行数据填充,这里需要进行开窗处理。

执行语句

select riqi,
       name,
       round(coalesce(weight, avg(weight)over()),2) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info2

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 83.83   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.57   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.2.2 组内平均值

直接整体数据取平均值,有时候并不能满足业务需求,例如上面的例子里面,我们希望是能拿到对应用户的体重平均值来填充,似乎这样更加合理

执行语句

select riqi,
       name,
       round(coalesce(weight, avg(weight)over(partition by name)),2) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info2

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.29   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.19   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.57   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.3 使用上一行数据补充,上一行值也为空,使用默认值补充

 +-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | NULL    | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | NULL    | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

执行语句

select riqi,
       name,
       coalesce(weight, lag(weight) over (partition by name order by riqi asc), 0) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info3

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.31   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 0.00    | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 0.00    | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.4 使用上一行数据补充,上一行值也为空,则用上2行补充(指定规则),依旧为空使用默认值补充

样例数据

还是使用t_user_info3 中的数据

执行语句

select riqi,
       name,
       coalesce(weight, lag(weight) over (partition by name order by riqi asc), lag(weight,2) over (partition by name order by riqi asc),0) as weight,
       height,
       bmi,
       weekly_training_cnt
from t_user_info3

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.31   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.31   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 0.00    | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.5 使用上一行数据补充,上一行值也为空,则上溯至非空行,如果至开始行也为空,则默认值补充

很多公司的面试题目

  • 知识点1:需要将空值的数据与上面非空的数据进行分组(连续问题的变种)
  • 知识点2:使用组内第一行数据填充;

执行语句

select riqi,
       name,
       coalesce(last_value(weight, true) over (partition by group_id order by riqi asc),0) weight,
       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc) as group_id
      from t_user_info3) t
order by name,riqi;

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 77.49   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 77.49   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 0.00    | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.6 使用上一行数据补充,上一行值也为空,则上溯至非空行,如果至开始行也为空,使用向下至非空行补充。

还是使用t_user_info3中的数据,本题需要将空行除了与上面行放到一组,还需要与下面行放到一组。

为了方便查看,先看分组ID,注意查看空值对应的组ID与哪个有值数据行相同。

执行语句

select riqi,
       name,
       weight,
       height,
       bmi,
       weekly_training_cnt,
       sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc)  as group_id1,
       sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
from t_user_info3
order by name,riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+------------+------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  | group_id1  | group_id2  |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    | 1          | 8          |
| 2025-07-02  | 张三    | NULL    | 183.50  | 27.11  | 3                    | 1          | 7          |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    | 1          | 7          |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    | 2          | 7          |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    | 3          | 6          |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    | 4          | 5          |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    | 5          | 4          |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    | 6          | 3          |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    | 7          | 2          |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    | 8          | 1          |
| 2025-07-01  | 李四    | NULL    | 175.20  | 25.27  | 2                    | 0          | 9          |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    | 1          | 9          |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    | 2          | 8          |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    | 3          | 7          |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    | 4          | 6          |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    | 5          | 5          |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    | 6          | 4          |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    | 7          | 3          |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    | 8          | 2          |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    | 9          | 1          |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+

分别根据group_id1和group_id2进行分组,使用last_value函数向上取值和向下取值。然后使用coalesce完成取值。

执行语句

select riqi,
       name,
       coalesce(last_value(weight, true) over (partition by name,group_id1 order by riqi asc),
                last_value(weight, true) over (partition by name,group_id2 order by riqi desc),
                1) as weight,

       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc)  as group_id1,
             sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
      from t_user_info3) t
order by name, riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.31   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.31   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.49   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.7 使用上一非空行和下一非空行平均值进行补充

这里有几个问题需要解决: 1.找出上一非空行和下一非空行,计算二者的均值; 2.如果向上或者向下,不存在非空行,则需要取只取其中一个值。

在1.6中我们没有展示,但是已经计算出上一非空行和下一非空行的值。我们先看一下结果

执行语句

select riqi,
       name,
       weight,
       last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
       last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight,
       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc)  as group_id1,
             sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
      from t_user_info3) t
order by name, riqi

执行结果

+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+
|    riqi     | name  | weight  | last_weight  | next_weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 91.31        | 91.31        | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | NULL    | 91.31        | 91.29        | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | NULL    | 91.31        | 91.29        | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 91.29        | 91.29        | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 91.30        | 91.30        | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 91.22        | 91.22        | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 91.18        | 91.18        | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 91.09        | 91.09        | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 91.01        | 91.01        | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 91.01        | 91.01        | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | NULL    | NULL         | 77.49        | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 77.49        | 77.49        | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 77.34        | 77.34        | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 77.30        | 77.30        | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 77.23        | 77.23        | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 77.05        | 77.05        | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 77.01        | 77.01        | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 76.98        | 76.98        | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 77.13        | 77.13        | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 76.97        | 76.97        | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+

计算平均值,(last_weight+next_weight)/2 则为结果,但是需要处理其中存在空值的情况。

执行语句

select riqi,
       name,
       coalesce(weight,
                case
                    when last_weight is null and next_weight is null then null
                    when last_weight is null or next_weight is null then coalesce(last_weight, next_weight)
                    when last_weight is not null and next_weight is not null
                        then round((last_weight + next_weight) / 2, 2)
                    end,
                0
       ) as weight,
       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             last_value(weight, true) over (partition by name,group_id1 order by riqi asc)  as last_weight,
             last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight,
             height,
             bmi,
             weekly_training_cnt
      from (select riqi,
                   name,
                   weight,
                   height,
                   bmi,
                   weekly_training_cnt,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi asc)  as group_id1,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi desc) as group_id2
            from t_user_info3) t) tt
order by name, riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.30   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.30   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.29   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.49   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.23   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.05   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 76.97   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.8 使用最近非空值进行补充

先定义一下,如果距离上下非空值相同时,取上非空值。

解法涉及连续问题组内排序,即当前行距离分组首行的距离。为了更加直观,我们调整一下测试数据,增加一些空值数据。

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | NULL    | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | NULL    | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | NULL    | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | NULL    | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | NULL    | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | NULL    | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

我们还是需要从1.6的group_id1,group_id2开始,使用row_number(),对组内进行排序,然后-1就是当前行距离初始有值行的距离。我们先看下中间结果

执行语句

select riqi,
       name,
       weight,
       height,
       bmi,
       weekly_training_cnt,
       group_id1,
       group_id2,
       row_number() over (partition by name,group_id1 order by riqi asc) - 1 as distance_1,
       row_number() over (partition by name,group_id2 order by riqi desc) - 1 as distance_2
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             sum(case when weight is null then 0 else 1 end)
                 over (partition by name order by riqi asc)  as group_id1,
             sum(case when weight is null then 0 else 1 end)
                 over (partition by name order by riqi desc) as group_id2
      from t_user_info4) t
order by name,riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  | group_id1  | group_id2  | distance_1  | distance_2  |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    | 1          | 7          | 0           | 0           |
| 2025-07-02  | 张三    | NULL    | 183.50  | 27.11  | 3                    | 1          | 6          | 1           | 3           |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    | 1          | 6          | 2           | 2           |
| 2025-07-04  | 张三    | NULL    | 183.50  | 27.11  | 4                    | 1          | 6          | 3           | 1           |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    | 2          | 6          | 0           | 0           |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    | 3          | 5          | 0           | 0           |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    | 4          | 4          | 0           | 0           |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    | 5          | 3          | 0           | 0           |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    | 6          | 2          | 0           | 0           |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    | 7          | 1          | 0           | 0           |
| 2025-07-01  | 李四    | NULL    | 175.20  | 25.27  | 2                    | 0          | 6          | 0           | 1           |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    | 1          | 6          | 0           | 0           |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    | 2          | 5          | 0           | 0           |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    | 3          | 4          | 0           | 0           |
| 2025-07-05  | 李四    | NULL    | 175.20  | 25.16  | 6                    | 3          | 3          | 1           | 2           |
| 2025-07-06  | 李四    | NULL    | 175.20  | 25.10  | 7                    | 3          | 3          | 2           | 1           |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    | 4          | 3          | 0           | 0           |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    | 5          | 2          | 0           | 0           |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    | 6          | 1          | 0           | 0           |
| 2025-07-10  | 李四    | NULL    | 175.20  | 25.08  | 4                    | 6          | 0          | 1           | 0           |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+

接下来我们根据name与group_id1和group_id2进行分组,得到上一非空值 last_weight,和下一非空值next_weight。

执行语句

select riqi,
       name,
       weight,
       height,
       bmi,
       weekly_training_cnt,
       group_id1,
       group_id2,
       row_number() over (partition by name,group_id1 order by riqi asc) - 1          as distance_1,
       row_number() over (partition by name,group_id2 order by riqi desc) - 1         as distance_2,
       last_value(weight, true) over (partition by name,group_id1 order by riqi asc)  as last_weight,
       last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             sum(case when weight is null then 0 else 1 end)
                 over (partition by name order by riqi asc)  as group_id1,
             sum(case when weight is null then 0 else 1 end)
                 over (partition by name order by riqi desc) as group_id2
      from t_user_info4) t
      order by name,riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  | group_id1  | group_id2  | distance_1  | distance_2  | last_weight  | next_weight  |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    | 1          | 7          | 0           | 0           | 91.31        | 91.31        |
| 2025-07-02  | 张三    | NULL    | 183.50  | 27.11  | 3                    | 1          | 6          | 1           | 3           | 91.31        | 91.30        |
| 2025-07-03  | 张三    | NULL    | 183.50  | 27.11  | 4                    | 1          | 6          | 2           | 2           | 91.31        | 91.30        |
| 2025-07-04  | 张三    | NULL    | 183.50  | 27.11  | 4                    | 1          | 6          | 3           | 1           | 91.31        | 91.30        |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    | 2          | 6          | 0           | 0           | 91.30        | 91.30        |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    | 3          | 5          | 0           | 0           | 91.22        | 91.22        |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    | 4          | 4          | 0           | 0           | 91.18        | 91.18        |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    | 5          | 3          | 0           | 0           | 91.09        | 91.09        |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    | 6          | 2          | 0           | 0           | 91.01        | 91.01        |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    | 7          | 1          | 0           | 0           | 91.01        | 91.01        |
| 2025-07-01  | 李四    | NULL    | 175.20  | 25.27  | 2                    | 0          | 6          | 0           | 1           | NULL         | 77.49        |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    | 1          | 6          | 0           | 0           | 77.49        | 77.49        |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    | 2          | 5          | 0           | 0           | 77.34        | 77.34        |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    | 3          | 4          | 0           | 0           | 77.30        | 77.30        |
| 2025-07-05  | 李四    | NULL    | 175.20  | 25.16  | 6                    | 3          | 3          | 1           | 2           | 77.30        | 77.01        |
| 2025-07-06  | 李四    | NULL    | 175.20  | 25.10  | 7                    | 3          | 3          | 2           | 1           | 77.30        | 77.01        |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    | 4          | 3          | 0           | 0           | 77.01        | 77.01        |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    | 5          | 2          | 0           | 0           | 76.98        | 76.98        |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    | 6          | 1          | 0           | 0           | 77.13        | 77.13        |
| 2025-07-10  | 李四    | NULL    | 175.20  | 25.08  | 4                    | 6          | 0          | 1           | 0           | 77.13        | NULL         |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+

我们根据distance_1distance_2的大小,来取last_weight还是next_weight,但是我们看李四 7月1日和7月10的数据,不能简单计算,否则结果返回结果为空值,所以需要进行处理。

执行语句

select riqi,
       name,
       coalesce(weight,
                if(distance_1 <= distance_2, coalesce(last_weight, next_weight), coalesce(next_weight, last_weight)), 0
       ) as weight,
       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             group_id1,
             group_id2,
             row_number() over (partition by name,group_id1 order by riqi asc) - 1          as distance_1,
             row_number() over (partition by name,group_id2 order by riqi desc) - 1         as distance_2,
             last_value(weight, true) over (partition by name,group_id1 order by riqi asc)  as last_weight,
             last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
      from (select riqi,
                   name,
                   weight,
                   height,
                   bmi,
                   weekly_training_cnt,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi asc)  as group_id1,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi desc) as group_id2
            from t_user_info4) t) tt
order by name, riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.31   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.31   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.30   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.49   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.30   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.01   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 77.13   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+

1.9 使用上一非空行和下一非空行进行梯度均值补充

但看这个需求很难,但是我们在1.8中分别计算出了distance_1、distance_2、last_weight、next_weight 那么进行均值填充就变得容易很多。 计算方式如下 : (distance_1 next_weight +distance_2 last_weight) /(distance_1+distance_2) 这里仅仅是非首尾为空值的,李四的首尾空行数据,依旧需要单独处理。

执行语句

select riqi,
       name,
       coalesce(weight,
                round((distance_1 * next_weight +distance_2 *last_weight) /(distance_1+distance_2),2),
       coalesce(last_weight,next_weight)
       ) as weight,
       height,
       bmi,
       weekly_training_cnt
from (select riqi,
             name,
             weight,
             height,
             bmi,
             weekly_training_cnt,
             group_id1,
             group_id2,
             row_number() over (partition by name,group_id1 order by riqi asc) - 1          as distance_1,
             row_number() over (partition by name,group_id2 order by riqi desc) - 1         as distance_2,
             last_value(weight, true) over (partition by name,group_id1 order by riqi asc)  as last_weight,
             last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
      from (select riqi,
                   name,
                   weight,
                   height,
                   bmi,
                   weekly_training_cnt,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi asc)  as group_id1,
                   sum(case when weight is null then 0 else 1 end)
                       over (partition by name order by riqi desc) as group_id2
            from t_user_info4) t) tt
order by name, riqi

执行结果

+-------------+-------+---------+---------+--------+----------------------+
|    riqi     | name  | weight  | height  |  bmi   | weekly_training_cnt  |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01  | 张三    | 91.31   | 183.50  | 27.12  | 2                    |
| 2025-07-02  | 张三    | 91.31   | 183.50  | 27.11  | 3                    |
| 2025-07-03  | 张三    | 91.31   | 183.50  | 27.11  | 4                    |
| 2025-07-04  | 张三    | 91.30   | 183.50  | 27.11  | 4                    |
| 2025-07-05  | 张三    | 91.30   | 183.50  | 27.11  | 5                    |
| 2025-07-06  | 张三    | 91.22   | 183.50  | 27.09  | 6                    |
| 2025-07-07  | 张三    | 91.18   | 183.50  | 27.08  | 1                    |
| 2025-07-08  | 张三    | 91.09   | 183.50  | 27.05  | 2                    |
| 2025-07-09  | 张三    | 91.01   | 183.50  | 27.03  | 3                    |
| 2025-07-10  | 张三    | 91.01   | 183.50  | 27.03  | 4                    |
| 2025-07-01  | 李四    | 77.49   | 175.20  | 25.27  | 2                    |
| 2025-07-02  | 李四    | 77.49   | 175.20  | 25.25  | 3                    |
| 2025-07-03  | 李四    | 77.34   | 175.20  | 25.20  | 4                    |
| 2025-07-04  | 李四    | 77.30   | 175.20  | 25.18  | 5                    |
| 2025-07-05  | 李四    | 77.20   | 175.20  | 25.16  | 6                    |
| 2025-07-06  | 李四    | 77.11   | 175.20  | 25.10  | 7                    |
| 2025-07-07  | 李四    | 77.01   | 175.20  | 25.09  | 1                    |
| 2025-07-08  | 李四    | 76.98   | 175.20  | 25.08  | 2                    |
| 2025-07-09  | 李四    | 77.13   | 175.20  | 25.13  | 3                    |
| 2025-07-10  | 李四    | 77.13   | 175.20  | 25.08  | 4                    |
+-------------+-------+---------+---------+--------+----------------------+