常见大数据面试SQL-有序行转列
一、题目
有学生各学科分数表,记录了学生的各科分数,请按照学生粒度,生成两列数据分别为学科和分数,要求学科内的顺序与分数顺序一致。
样例数据
+----------+----------+--------+
| student | subject | score |
+----------+----------+--------+
| 张三 | 数学 | 80 |
| 张三 | 英语 | 82 |
| 张三 | 语文 | 95 |
| 李四 | 数学 | 90 |
| 李四 | 英语 | 93 |
| 李四 | 语文 | 90 |
| 王五 | 数学 | 92 |
| 王五 | 英语 | 88 |
| 王五 | 语文 | 88 |
| 赵六 | 数学 | 84 |
| 赵六 | 英语 | 68 |
| 赵六 | 语文 | 77 |
+----------+----------+--------+
期望结果
+----------+-----------+-----------+
| student | subjects | scores |
+----------+-----------+-----------+
| 张三 | 语文,数学,英语 | 95,80,82 |
| 李四 | 语文,数学,英语 | 90,90,93 |
| 王五 | 语文,数学,英语 | 88,92,88 |
| 赵六 | 语文,数学,英语 | 77,84,68 |
+----------+-----------+-----------+
二、分析
题目首先考察行转列,但是难度要比简单行转列要复杂很多。要求学科与成绩一一对应。 这里并没有要求每个学生的学科顺序一致,即张三的subjects是语文,数学,英语,李四的subjects可以是语文,英语,数学。 但是要求scores的顺序与subjects中的顺序一致。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.拼接学科和成绩
使用struct将学科和成绩转换为结构体,这样绑定了学科和成绩。
执行SQL
select student,
struct(subject, score) as subject_score
from t17_student_score
查询结果
+----------+------------------------------+
| student | subject_score |
+----------+------------------------------+
| 张三 | {"subject":"语文","score":95} |
| 张三 | {"subject":"数学","score":80} |
| 张三 | {"subject":"英语","score":82} |
| 李四 | {"subject":"语文","score":90} |
| 李四 | {"subject":"数学","score":90} |
| 李四 | {"subject":"英语","score":93} |
| 王五 | {"subject":"语文","score":88} |
| 王五 | {"subject":"数学","score":92} |
| 王五 | {"subject":"英语","score":88} |
| 赵六 | {"subject":"语文","score":77} |
| 赵六 | {"subject":"数学","score":84} |
| 赵六 | {"subject":"英语","score":68} |
+----------+------------------------------+
2.使用collect_list将结构体转换为数组
使用collect_list函数,将数据进行行转列,得到数组。注意:不同学科顺序是不能保证有序的,有序是巧合。
执行SQL
select student,
collect_list(struct(subject, score)) as subject_score
from t17_student_score
group by student
查询结果
+----------+----------------------------------------------------+
| student | subject_score |
+----------+----------------------------------------------------+
| 张三 | [{"subject":"语文","score":95},{"subject":"数学","score":80},{"subject":"英语","score":82}] |
| 李四 | [{"subject":"语文","score":90},{"subject":"数学","score":90},{"subject":"英语","score":93}] |
| 王五 | [{"subject":"语文","score":88},{"subject":"数学","score":92},{"subject":"英语","score":88}] |
| 赵六 | [{"subject":"语文","score":77},{"subject":"数学","score":84},{"subject":"英语","score":68}] |
+----------+----------------------------------------------------+
3.将结构体数组转换为map
执行SQL
select student,
map_from_entries(collect_list(struct(subject, score))) as subject_score
from t17_student_score
group by student
查询结果
+----------+----------------------------+
| student | subject_score |
+----------+----------------------------+
| 张三 | {"数学":80,"英语":82,"语文":95} |
| 李四 | {"数学":90,"英语":93,"语文":90} |
| 王五 | {"数学":92,"英语":88,"语文":88} |
| 赵六 | {"数学":84,"英语":68,"语文":77} |
+----------+----------------------------+
4.从map中分别取出keys和values作为subjects和scores,得到最终结果。
执行SQL
select student,
concat_ws(',', map_keys(subject_score)) as subjects,
concat_ws(',', map_values(subject_score)) as scores
from (select student,
map_from_entries(collect_list(struct(subject, score))) as subject_score
from t17_student_score
group by student) t
查询结果
+----------+-----------+-----------+
| student | subjects | scores |
+----------+-----------+-----------+
| 张三 | 语文,数学,英语 | 95,80,82 |
| 李四 | 语文,数学,英语 | 90,90,93 |
| 王五 | 语文,数学,英语 | 88,92,88 |
| 赵六 | 语文,数学,英语 | 77,84,68 |
+----------+-----------+-----------+
四、建表语句和数据插入
--建表语句
create table if not exists t17_student_score
(
student string,
subject string,
score bigint
);
--插入数据
insert into t17_student_score(student, subject, score)
values ('张三', '语文', 95),
('张三', '数学', 80),
('张三', '英语', 82),
('李四', '语文', 90),
('李四', '数学', 90),
('李四', '英语', 93),
('王五', '语文', 88),
('王五', '数学', 92),
('王五', '英语', 88),
('赵六', '语文', 77),
('赵六', '数学', 84),
('赵六', '英语', 68);
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab) 发表;