13. [实际案例]-行列转化-拆分有key但是没有分隔符且无序字符串
- 专题
- 6天前
- 22热度
- 0评论
一、题目
当前有以下数据 t_detai,
+-----+------------------+
| id | detail |
+-----+------------------+
| 1 | A:1,2,3,B:2,3,4 |
| 2 | A:3,4 |
| 3 | B:1,3 |
| 4 | B:1,2,A:2,3,4 |
| 5 | |
+-----+------------------+
期望结果:
+-----+-----------+-----------+
| id | A_detail | B_detail |
+-----+-----------+-----------+
| 1 | 1,2,3 | 2,3,4 |
| 2 | 3,4 | |
| 3 | | 1,3 |
| 4 | 2,3,4 | 1,2 |
| 5 | | |
+-----+-----------+-----------+
说明
- 题目是实际工作问题的一步,进行了抽象和简化;
- A:和B:可以理解为detail中key,不会重复,是复杂字符串;
- 在A内容和B内容之间无分隔符;
- detail字符串中A、B的先后顺序不定,例如:第一行和第四行;
二、解题思路
因为A:和B:顺序不确定,所以不能用A:或者B:充当分隔符,因为这样处理会比较麻烦,不好处理A在前或者B在前的逻辑,并且我们需要保留A和B作为后续作为不同列内容的区分。没有Key之间的分隔符,所以能想到的是给他加一个分隔符; 使用分隔符将内容进行列转行(炸裂); 当AB内容清洗出来之后,对A和其内容进行字符串拆分,拆分成A和具体内容,B和具体内容; 将内容进行行转列,得到最终结果;
三、执行SQL
1.添加分隔符
需要分别给A:,B:添加一个;当道做分割,添加之后,我们把整个字符串开始添加的;去掉。
执行SQL
select
id,
detail,
substr(regexp_replace(regexp_replace(detail,'A:',';A:'),'B:',';B:'),2) as new_detail
from t_detail;
执行结果
+-----+------------------+-------------------+
| id | detail | new_detail |
+-----+------------------+-------------------+
| 1 | A:1,2,3,B:2,3,4 | A:1,2,3,;B:2,3,4 |
| 2 | A:3,4 | A:3,4 |
| 3 | B:1,3 | B:1,3 |
| 4 | B:1,2,A:2,3,4 | B:1,2,;A:2,3,4 |
| 5 | | |
+-----+------------------+-------------------+
2.列转行
使用添加的分隔符;将new_detail进行列转行
执行SQL
select
t.id,
t.detail,
t.new_detail,
t1.detail_info
from (select id,
detail,
substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
from t_detail) t
lateral view explode(split(new_detail,';')) t1 as detail_info
执行结果
+-----+------------------+-------------------+--------------+
| id | detail | new_detail | detail_info |
+-----+------------------+-------------------+--------------+
| 1 | A:1,2,3,B:2,3,4 | A:1,2,3,;B:2,3,4 | A:1,2,3, |
| 1 | A:1,2,3,B:2,3,4 | A:1,2,3,;B:2,3,4 | B:2,3,4 |
| 2 | A:3,4 | A:3,4 | A:3,4 |
| 3 | B:1,3 | B:1,3 | B:1,3 |
| 4 | B:1,2,A:2,3,4 | B:1,2,;A:2,3,4 | B:1,2, |
| 4 | B:1,2,A:2,3,4 | B:1,2,;A:2,3,4 | A:2,3,4 |
| 5 | | | |
+-----+------------------+-------------------+--------------+
3.拆分key与内容
detail_info就是我们要的内容,现在需要将A与A对应内容,B与B对应内容拆开,因为在前面的内容会多一个,,给它去掉
执行SQL
select
t.id,
t.detail,
t.new_detail,
t1.detail_info,
split(t1.detail_info,':')[0] as detail_info_type,
regexp_replace(split(t1.detail_info,':')[1],',$','') as detail_info_str
from (select id,
detail,
substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
from t_detail) t
lateral view explode(split(new_detail,';')) t1 as detail_info
执行结果
+-----+------------------+-------------------+--------------+-------------------+------------------+
| id | detail | new_detail | detail_info | detail_info_type | detail_info_str |
+-----+------------------+-------------------+--------------+-------------------+------------------+
| 1 | A:1,2,3,B:2,3,4 | A:1,2,3,;B:2,3,4 | A:1,2,3, | A | 1,2,3 |
| 1 | A:1,2,3,B:2,3,4 | A:1,2,3,;B:2,3,4 | B:2,3,4 | B | 2,3,4 |
| 2 | A:3,4 | A:3,4 | A:3,4 | A | 3,4 |
| 3 | B:1,3 | B:1,3 | B:1,3 | B | 1,3 |
| 4 | B:1,2,A:2,3,4 | B:1,2,;A:2,3,4 | B:1,2, | B | 1,2 |
| 4 | B:1,2,A:2,3,4 | B:1,2,;A:2,3,4 | A:2,3,4 | A | 2,3,4 |
| 5 | | | | | NULL |
+-----+------------------+-------------------+--------------+-------------------+------------------+
4.行转列
根据detail_info_type进行分组,对detail_info_str进行行转列,得到最终结果 执行SQL
select
id,
concat_ws(',',collect_list(case when detail_info_type = 'A' then detail_info_str end)) as A_detail,
concat_ws(',',collect_list(case when detail_info_type = 'B' then detail_info_str end)) as B_detail
from (select t.id,
t.detail,
t.new_detail,
t1.detail_info,
split(t1.detail_info, ':')[0] as detail_info_type,
regexp_replace(split(t1.detail_info, ':')[1], ',$', '') as detail_info_str
from (select id,
detail,
substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
from t_detail) t
lateral view explode(split(new_detail, ';')) t1 as detail_info) tt
group by id
执行结果
+-----+-----------+-----------+
| id | A_detail | B_detail |
+-----+-----------+-----------+
| 1 | 1,2,3 | 2,3,4 |
| 2 | 3,4 | |
| 3 | | 1,3 |
| 4 | 2,3,4 | 1,2 |
| 5 | | |
+-----+-----------+-----------+ 