13. [实际案例]-行列转化-拆分有key但是没有分隔符且无序字符串

一、题目

当前有以下数据 t_detai,

+-----+------------------+
| id  |      detail      |
+-----+------------------+
| 1   | A:1,2,3,B:2,3,4  |
| 2   | A:3,4            |
| 3   | B:1,3            |
| 4   | B:1,2,A:2,3,4    |
| 5   |                  |
+-----+------------------+

期望结果:

+-----+-----------+-----------+
| id  | A_detail  | B_detail  |
+-----+-----------+-----------+
| 1   | 1,2,3     | 2,3,4     |
| 2   | 3,4       |           |
| 3   |           | 1,3       |
| 4   | 2,3,4     | 1,2       |
| 5   |           |           |
+-----+-----------+-----------+

说明

  1. 题目是实际工作问题的一步,进行了抽象和简化;
  2. A:和B:可以理解为detail中key,不会重复,是复杂字符串;
  3. 在A内容和B内容之间无分隔符;
  4. detail字符串中A、B的先后顺序不定,例如:第一行和第四行;

二、解题思路

因为A:和B:顺序不确定,所以不能用A:或者B:充当分隔符,因为这样处理会比较麻烦,不好处理A在前或者B在前的逻辑,并且我们需要保留A和B作为后续作为不同列内容的区分。没有Key之间的分隔符,所以能想到的是给他加一个分隔符; 使用分隔符将内容进行列转行(炸裂); 当AB内容清洗出来之后,对A和其内容进行字符串拆分,拆分成A和具体内容,B和具体内容; 将内容进行行转列,得到最终结果;

三、执行SQL

1.添加分隔符

需要分别给A:,B:添加一个;当道做分割,添加之后,我们把整个字符串开始添加的;去掉。

执行SQL

select
        id,
        detail,
        substr(regexp_replace(regexp_replace(detail,'A:',';A:'),'B:',';B:'),2) as new_detail
from t_detail;

执行结果

+-----+------------------+-------------------+
| id  |      detail      |    new_detail     |
+-----+------------------+-------------------+
| 1   | A:1,2,3,B:2,3,4  | A:1,2,3,;B:2,3,4  |
| 2   | A:3,4            | A:3,4             |
| 3   | B:1,3            | B:1,3             |
| 4   | B:1,2,A:2,3,4    | B:1,2,;A:2,3,4    |
| 5   |                  |                   |
+-----+------------------+-------------------+

2.列转行

使用添加的分隔符;将new_detail进行列转行

执行SQL

select
        t.id,
        t.detail,
        t.new_detail,
        t1.detail_info
from (select id,
             detail,
             substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
      from t_detail) t
lateral view explode(split(new_detail,';')) t1 as detail_info

执行结果

+-----+------------------+-------------------+--------------+
| id  |      detail      |    new_detail     | detail_info  |
+-----+------------------+-------------------+--------------+
| 1   | A:1,2,3,B:2,3,4  | A:1,2,3,;B:2,3,4  | A:1,2,3,     |
| 1   | A:1,2,3,B:2,3,4  | A:1,2,3,;B:2,3,4  | B:2,3,4      |
| 2   | A:3,4            | A:3,4             | A:3,4        |
| 3   | B:1,3            | B:1,3             | B:1,3        |
| 4   | B:1,2,A:2,3,4    | B:1,2,;A:2,3,4    | B:1,2,       |
| 4   | B:1,2,A:2,3,4    | B:1,2,;A:2,3,4    | A:2,3,4      |
| 5   |                  |                   |              |
+-----+------------------+-------------------+--------------+

3.拆分key与内容

detail_info就是我们要的内容,现在需要将A与A对应内容,B与B对应内容拆开,因为在前面的内容会多一个,,给它去掉

执行SQL

select
        t.id,
        t.detail,
        t.new_detail,
        t1.detail_info,
        split(t1.detail_info,':')[0] as detail_info_type,
        regexp_replace(split(t1.detail_info,':')[1],',$','') as detail_info_str
from (select id,
             detail,
             substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
      from t_detail) t
lateral view explode(split(new_detail,';')) t1 as detail_info

执行结果

+-----+------------------+-------------------+--------------+-------------------+------------------+
| id  |      detail      |    new_detail     | detail_info  | detail_info_type  | detail_info_str  |
+-----+------------------+-------------------+--------------+-------------------+------------------+
| 1   | A:1,2,3,B:2,3,4  | A:1,2,3,;B:2,3,4  | A:1,2,3,     | A                 | 1,2,3            |
| 1   | A:1,2,3,B:2,3,4  | A:1,2,3,;B:2,3,4  | B:2,3,4      | B                 | 2,3,4            |
| 2   | A:3,4            | A:3,4             | A:3,4        | A                 | 3,4              |
| 3   | B:1,3            | B:1,3             | B:1,3        | B                 | 1,3              |
| 4   | B:1,2,A:2,3,4    | B:1,2,;A:2,3,4    | B:1,2,       | B                 | 1,2              |
| 4   | B:1,2,A:2,3,4    | B:1,2,;A:2,3,4    | A:2,3,4      | A                 | 2,3,4            |
| 5   |                  |                   |              |                   | NULL             |
+-----+------------------+-------------------+--------------+-------------------+------------------+

4.行转列

根据detail_info_type进行分组,对detail_info_str进行行转列,得到最终结果 执行SQL

select
    id,
    concat_ws(',',collect_list(case when detail_info_type = 'A' then detail_info_str end)) as A_detail,
    concat_ws(',',collect_list(case when detail_info_type = 'B' then detail_info_str end)) as B_detail
from (select t.id,
             t.detail,
             t.new_detail,
             t1.detail_info,
             split(t1.detail_info, ':')[0]                           as detail_info_type,
             regexp_replace(split(t1.detail_info, ':')[1], ',$', '') as detail_info_str
      from (select id,
                   detail,
                   substr(regexp_replace(regexp_replace(detail, 'A:', ';A:'), 'B:', ';B:'), 2) as new_detail
            from t_detail) t
               lateral view explode(split(new_detail, ';')) t1 as detail_info) tt
group by id

执行结果

+-----+-----------+-----------+
| id  | A_detail  | B_detail  |
+-----+-----------+-----------+
| 1   | 1,2,3     | 2,3,4     |
| 2   | 3,4       |           |
| 3   |           | 1,3       |
| 4   | 2,3,4     | 1,2       |
| 5   |           |           |
+-----+-----------+-----------+