如何处理缺失值
- 缺失值处理
- 3天前
- 27热度
- 0评论
日常工作遇到数据缺失的问题,为保证数据质量或者后续使用方便,我们需要对缺失数据进行处理,今天先聊一下字段值的缺失及填充方式。
字段缺失指在数据表中数据行存在(主键存在),但是某个字段在某一行或者某些行为null的情况。
涉及函数
coalesce()
round()
avg()
lag()/lead()
last_value()/first_value()
sum()
row_number()
1.1 使用默认值/规则补充
这里的默认值/规则指:可能是0,某个特定的值,或者根据本行其他字段可以计算的结果。
样例数据
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
在上面样例数据中,张三2025-07-03日的体重数据缺失,可以进行直接处理,填入一个默认值,例如-999,0 等给定默认值;
执行语句
select riqi,
name,
coalesce(weight, -999) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info1
结果
+-------------+-------+----------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+----------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | -999.00 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
+-------------+-------+----------+---------+--------+----------------------+
如此补充数据,能够解决空值问题,但是这些数据无实际意义,补数太过暴力。我们知道体重、身高和身高质量指数之间存在勾稽关系,如果有另外两个值,是可以计算出缺失值的。所以我们可以根据这个进行数据补充,计算公式为: BMI=体重÷身高的平方。
执行语句
select riqi,
name,
coalesce(weight, round(bmi*power(height/100,2),2)) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info1
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.3 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.2 使用平均值(中位数、众数)补充
指使用该列的平均值进行补充,当然也可以是组内平均值,或者使用中位数、众数等各类统计值进行补充。
样例数据
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.57 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.2.1 平均值
我们可以取整列的平均值进行数据填充,这里需要进行开窗处理。
执行语句
select riqi,
name,
round(coalesce(weight, avg(weight)over()),2) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info2
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 83.83 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.57 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.2.2 组内平均值
直接整体数据取平均值,有时候并不能满足业务需求,例如上面的例子里面,我们希望是能拿到对应用户的体重平均值来填充,似乎这样更加合理
执行语句
select riqi,
name,
round(coalesce(weight, avg(weight)over(partition by name)),2) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info2
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.19 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.57 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.3 使用上一行数据补充,上一行值也为空,使用默认值补充
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | NULL | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | NULL | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
执行语句
select riqi,
name,
coalesce(weight, lag(weight) over (partition by name order by riqi asc), 0) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info3
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.31 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 0.00 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 0.00 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.4 使用上一行数据补充,上一行值也为空,则用上2行补充(指定规则),依旧为空使用默认值补充
样例数据
还是使用t_user_info3 中的数据
执行语句
select riqi,
name,
coalesce(weight, lag(weight) over (partition by name order by riqi asc), lag(weight,2) over (partition by name order by riqi asc),0) as weight,
height,
bmi,
weekly_training_cnt
from t_user_info3
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.31 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.31 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 0.00 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.5 使用上一行数据补充,上一行值也为空,则上溯至非空行,如果至开始行也为空,则默认值补充
很多公司的面试题目
- 知识点1:需要将空值的数据与上面非空的数据进行分组(连续问题的变种)
- 知识点2:使用组内第一行数据填充;
执行语句
select riqi,
name,
coalesce(last_value(weight, true) over (partition by group_id order by riqi asc),0) weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc) as group_id
from t_user_info3) t
order by name,riqi;
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 77.49 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 77.49 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 0.00 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.6 使用上一行数据补充,上一行值也为空,则上溯至非空行,如果至开始行也为空,使用向下至非空行补充。
还是使用t_user_info3中的数据,本题需要将空行除了与上面行放到一组,还需要与下面行放到一组。
为了方便查看,先看分组ID,注意查看空值对应的组ID与哪个有值数据行相同。
执行语句
select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
from t_user_info3
order by name,riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+------------+------------+
| riqi | name | weight | height | bmi | weekly_training_cnt | group_id1 | group_id2 |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 | 1 | 8 |
| 2025-07-02 | 张三 | NULL | 183.50 | 27.11 | 3 | 1 | 7 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 | 1 | 7 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 | 2 | 7 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 | 3 | 6 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 | 4 | 5 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 | 5 | 4 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 | 6 | 3 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 | 7 | 2 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 | 8 | 1 |
| 2025-07-01 | 李四 | NULL | 175.20 | 25.27 | 2 | 0 | 9 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 | 1 | 9 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 | 2 | 8 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 | 3 | 7 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 | 4 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 | 5 | 5 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 | 6 | 4 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 | 7 | 3 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 | 8 | 2 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 | 9 | 1 |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+
分别根据group_id1和group_id2进行分组,使用last_value函数向上取值和向下取值。然后使用coalesce完成取值。
执行语句
select riqi,
name,
coalesce(last_value(weight, true) over (partition by name,group_id1 order by riqi asc),
last_value(weight, true) over (partition by name,group_id2 order by riqi desc),
1) as weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
from t_user_info3) t
order by name, riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.31 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.31 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.49 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.7 使用上一非空行和下一非空行平均值进行补充
这里有几个问题需要解决: 1.找出上一非空行和下一非空行,计算二者的均值; 2.如果向上或者向下,不存在非空行,则需要取只取其中一个值。
在1.6中我们没有展示,但是已经计算出上一非空行和下一非空行的值。我们先看一下结果
执行语句
select riqi,
name,
weight,
last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end) over (partition by name order by riqi desc) as group_id2
from t_user_info3) t
order by name, riqi
执行结果
+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+
| riqi | name | weight | last_weight | next_weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 91.31 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | NULL | 91.31 | 91.29 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | NULL | 91.31 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 91.29 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 91.30 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 91.22 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 91.18 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 91.09 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 91.01 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 91.01 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | NULL | NULL | 77.49 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 77.49 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 77.34 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 77.30 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 77.23 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 77.05 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 77.01 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 76.98 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 77.13 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 76.97 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+--------------+--------------+---------+--------+----------------------+
计算平均值,(last_weight+next_weight)/2 则为结果,但是需要处理其中存在空值的情况。
执行语句
select riqi,
name,
coalesce(weight,
case
when last_weight is null and next_weight is null then null
when last_weight is null or next_weight is null then coalesce(last_weight, next_weight)
when last_weight is not null and next_weight is not null
then round((last_weight + next_weight) / 2, 2)
end,
0
) as weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi desc) as group_id2
from t_user_info3) t) tt
order by name, riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.30 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.30 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.29 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.49 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.23 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.05 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 76.97 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.8 使用最近非空值进行补充
先定义一下,如果距离上下非空值相同时,取上非空值。
解法涉及连续问题组内排序,即当前行距离分组首行的距离。为了更加直观,我们调整一下测试数据,增加一些空值数据。
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | NULL | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | NULL | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | NULL | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | NULL | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | NULL | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | NULL | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
我们还是需要从1.6的group_id1,group_id2开始,使用row_number(),对组内进行排序,然后-1就是当前行距离初始有值行的距离。我们先看下中间结果
执行语句
select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
group_id1,
group_id2,
row_number() over (partition by name,group_id1 order by riqi asc) - 1 as distance_1,
row_number() over (partition by name,group_id2 order by riqi desc) - 1 as distance_2
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi desc) as group_id2
from t_user_info4) t
order by name,riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+
| riqi | name | weight | height | bmi | weekly_training_cnt | group_id1 | group_id2 | distance_1 | distance_2 |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 | 1 | 7 | 0 | 0 |
| 2025-07-02 | 张三 | NULL | 183.50 | 27.11 | 3 | 1 | 6 | 1 | 3 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 | 1 | 6 | 2 | 2 |
| 2025-07-04 | 张三 | NULL | 183.50 | 27.11 | 4 | 1 | 6 | 3 | 1 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 | 2 | 6 | 0 | 0 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 | 3 | 5 | 0 | 0 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 | 4 | 4 | 0 | 0 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 | 5 | 3 | 0 | 0 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 | 6 | 2 | 0 | 0 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 | 7 | 1 | 0 | 0 |
| 2025-07-01 | 李四 | NULL | 175.20 | 25.27 | 2 | 0 | 6 | 0 | 1 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 | 1 | 6 | 0 | 0 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 | 2 | 5 | 0 | 0 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 | 3 | 4 | 0 | 0 |
| 2025-07-05 | 李四 | NULL | 175.20 | 25.16 | 6 | 3 | 3 | 1 | 2 |
| 2025-07-06 | 李四 | NULL | 175.20 | 25.10 | 7 | 3 | 3 | 2 | 1 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 | 4 | 3 | 0 | 0 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 | 5 | 2 | 0 | 0 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 | 6 | 1 | 0 | 0 |
| 2025-07-10 | 李四 | NULL | 175.20 | 25.08 | 4 | 6 | 0 | 1 | 0 |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+
接下来我们根据name与group_id1和group_id2进行分组,得到上一非空值 last_weight,和下一非空值next_weight。
执行语句
select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
group_id1,
group_id2,
row_number() over (partition by name,group_id1 order by riqi asc) - 1 as distance_1,
row_number() over (partition by name,group_id2 order by riqi desc) - 1 as distance_2,
last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi desc) as group_id2
from t_user_info4) t
order by name,riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+
| riqi | name | weight | height | bmi | weekly_training_cnt | group_id1 | group_id2 | distance_1 | distance_2 | last_weight | next_weight |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 | 1 | 7 | 0 | 0 | 91.31 | 91.31 |
| 2025-07-02 | 张三 | NULL | 183.50 | 27.11 | 3 | 1 | 6 | 1 | 3 | 91.31 | 91.30 |
| 2025-07-03 | 张三 | NULL | 183.50 | 27.11 | 4 | 1 | 6 | 2 | 2 | 91.31 | 91.30 |
| 2025-07-04 | 张三 | NULL | 183.50 | 27.11 | 4 | 1 | 6 | 3 | 1 | 91.31 | 91.30 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 | 2 | 6 | 0 | 0 | 91.30 | 91.30 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 | 3 | 5 | 0 | 0 | 91.22 | 91.22 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 | 4 | 4 | 0 | 0 | 91.18 | 91.18 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 | 5 | 3 | 0 | 0 | 91.09 | 91.09 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 | 6 | 2 | 0 | 0 | 91.01 | 91.01 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 | 7 | 1 | 0 | 0 | 91.01 | 91.01 |
| 2025-07-01 | 李四 | NULL | 175.20 | 25.27 | 2 | 0 | 6 | 0 | 1 | NULL | 77.49 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 | 1 | 6 | 0 | 0 | 77.49 | 77.49 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 | 2 | 5 | 0 | 0 | 77.34 | 77.34 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 | 3 | 4 | 0 | 0 | 77.30 | 77.30 |
| 2025-07-05 | 李四 | NULL | 175.20 | 25.16 | 6 | 3 | 3 | 1 | 2 | 77.30 | 77.01 |
| 2025-07-06 | 李四 | NULL | 175.20 | 25.10 | 7 | 3 | 3 | 2 | 1 | 77.30 | 77.01 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 | 4 | 3 | 0 | 0 | 77.01 | 77.01 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 | 5 | 2 | 0 | 0 | 76.98 | 76.98 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 | 6 | 1 | 0 | 0 | 77.13 | 77.13 |
| 2025-07-10 | 李四 | NULL | 175.20 | 25.08 | 4 | 6 | 0 | 1 | 0 | 77.13 | NULL |
+-------------+-------+---------+---------+--------+----------------------+------------+------------+-------------+-------------+--------------+--------------+
我们根据distance_1与distance_2的大小,来取last_weight还是next_weight,但是我们看李四 7月1日和7月10的数据,不能简单计算,否则结果返回结果为空值,所以需要进行处理。
执行语句
select riqi,
name,
coalesce(weight,
if(distance_1 <= distance_2, coalesce(last_weight, next_weight), coalesce(next_weight, last_weight)), 0
) as weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
group_id1,
group_id2,
row_number() over (partition by name,group_id1 order by riqi asc) - 1 as distance_1,
row_number() over (partition by name,group_id2 order by riqi desc) - 1 as distance_2,
last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi desc) as group_id2
from t_user_info4) t) tt
order by name, riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.31 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.31 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.30 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.49 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.30 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.01 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 77.13 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+
1.9 使用上一非空行和下一非空行进行梯度均值补充
但看这个需求很难,但是我们在1.8中分别计算出了distance_1、distance_2、last_weight、next_weight 那么进行均值填充就变得容易很多。 计算方式如下 : (distance_1 next_weight +distance_2 last_weight) /(distance_1+distance_2) 这里仅仅是非首尾为空值的,李四的首尾空行数据,依旧需要单独处理。
执行语句
select riqi,
name,
coalesce(weight,
round((distance_1 * next_weight +distance_2 *last_weight) /(distance_1+distance_2),2),
coalesce(last_weight,next_weight)
) as weight,
height,
bmi,
weekly_training_cnt
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
group_id1,
group_id2,
row_number() over (partition by name,group_id1 order by riqi asc) - 1 as distance_1,
row_number() over (partition by name,group_id2 order by riqi desc) - 1 as distance_2,
last_value(weight, true) over (partition by name,group_id1 order by riqi asc) as last_weight,
last_value(weight, true) over (partition by name,group_id2 order by riqi desc) as next_weight
from (select riqi,
name,
weight,
height,
bmi,
weekly_training_cnt,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi asc) as group_id1,
sum(case when weight is null then 0 else 1 end)
over (partition by name order by riqi desc) as group_id2
from t_user_info4) t) tt
order by name, riqi
执行结果
+-------------+-------+---------+---------+--------+----------------------+
| riqi | name | weight | height | bmi | weekly_training_cnt |
+-------------+-------+---------+---------+--------+----------------------+
| 2025-07-01 | 张三 | 91.31 | 183.50 | 27.12 | 2 |
| 2025-07-02 | 张三 | 91.31 | 183.50 | 27.11 | 3 |
| 2025-07-03 | 张三 | 91.31 | 183.50 | 27.11 | 4 |
| 2025-07-04 | 张三 | 91.30 | 183.50 | 27.11 | 4 |
| 2025-07-05 | 张三 | 91.30 | 183.50 | 27.11 | 5 |
| 2025-07-06 | 张三 | 91.22 | 183.50 | 27.09 | 6 |
| 2025-07-07 | 张三 | 91.18 | 183.50 | 27.08 | 1 |
| 2025-07-08 | 张三 | 91.09 | 183.50 | 27.05 | 2 |
| 2025-07-09 | 张三 | 91.01 | 183.50 | 27.03 | 3 |
| 2025-07-10 | 张三 | 91.01 | 183.50 | 27.03 | 4 |
| 2025-07-01 | 李四 | 77.49 | 175.20 | 25.27 | 2 |
| 2025-07-02 | 李四 | 77.49 | 175.20 | 25.25 | 3 |
| 2025-07-03 | 李四 | 77.34 | 175.20 | 25.20 | 4 |
| 2025-07-04 | 李四 | 77.30 | 175.20 | 25.18 | 5 |
| 2025-07-05 | 李四 | 77.20 | 175.20 | 25.16 | 6 |
| 2025-07-06 | 李四 | 77.11 | 175.20 | 25.10 | 7 |
| 2025-07-07 | 李四 | 77.01 | 175.20 | 25.09 | 1 |
| 2025-07-08 | 李四 | 76.98 | 175.20 | 25.08 | 2 |
| 2025-07-09 | 李四 | 77.13 | 175.20 | 25.13 | 3 |
| 2025-07-10 | 李四 | 77.13 | 175.20 | 25.08 | 4 |
+-------------+-------+---------+---------+--------+----------------------+ 