Skip to main content

常见大数据面试SQL-连续点击三次用户

一、题目

有用户点击日志记录表 t2_click_log,包含user_id(用户ID),click_time(点击时间),请查询出连续点击三次的用户数,

连续点击三次:指点击记录中同一用户连续点击,中间无其他用户点击;

+----------+--------------+
| user_id | click_time |
+----------+--------------+
| 1 | 1736337600 |
| 2 | 1736337670 |
| 1 | 1736337710 |
| 1 | 1736337715 |
| 1 | 1736337750 |
| 2 | 1736337760 |
| 3 | 1736337820 |
| 3 | 1736337840 |
| 3 | 1736337850 |
| 3 | 1736337910 |
| 4 | 1736337915 |
+----------+--------------+

二、分析

1.连续问题,最大连续次数的变种问题;

2.思路一:累积求和分组法(此种方式比连续N天登录略难一些)

​ (2.1)按照时间排序之后,使用lag()函数可以判断出当前行用户与上一行用户,是否是同一个用户;

​ (2.2)与上一行是同一个用户的日志,则给该行打标0,不同打标1;(属于经验:要累积求和,0和不变,1和+1.)

​ (2.3)对打标完成的标签,进行累积求和,和相同代表属于同一用户连续(聚合函数开窗);

​ (2.4)对累积求和的值进行统计,相同值的个数>=3 表示连续登录;

3.思路二:双重排序差值法(难度同连续N天登录)

​ (3.1)与连续登录天数类似。按照点击时间(click_time) 进行全排序,按照用户ID(user_id)分组,按照点击时间排序;

​ (3.2)对两次排序计算差值,按照用户和差值进行分组,相同用户,差值相同说明连续;

​ (3.3)计算属于同一分组的数量>=3则是连续,统计用户数得出结果;

维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️

三、SQL

3.1 累积求和法

1.增加一列is_same_user,判断是否与上一行是同一用户点击,是取0,否取1,第一行默认为0;

执行SQL

select user_id,
click_time,
case
when lag(user_id) over (order by click_time asc) is null then 0
when user_id = lag(user_id) over (order by click_time asc) then 0
else 1 end as is_same_user
from t2_click_log

查询结果

+----------+-------------+---------------+
| user_id | click_time | is_same_user |
+----------+-------------+---------------+
| 1 | 1736337600 | 0 |
| 2 | 1736337670 | 1 |
| 1 | 1736337710 | 1 |
| 1 | 1736337715 | 0 |
| 1 | 1736337750 | 0 |
| 2 | 1736337760 | 1 |
| 3 | 1736337820 | 1 |
| 3 | 1736337840 | 0 |
| 3 | 1736337850 | 0 |
| 3 | 1736337910 | 0 |
| 4 | 1736337915 | 1 |
+----------+-------------+---------------+

2.对是否是同一用户分组进行累积求和

执行SQL

select user_id,
click_time,
sum(is_same_user) over (order by click_time asc) as sum_order
from (select user_id,
click_time,
case
when lag(user_id) over (order by click_time asc) is null then 0
when user_id = lag(user_id) over (order by click_time asc) then 0
else 1 end as is_same_user
from t2_click_log) t

查询结果

+----------+-------------+------------+
| user_id | click_time | sum_order |
+----------+-------------+------------+
| 1 | 1736337600 | 0 |
| 2 | 1736337670 | 1 |
| 1 | 1736337710 | 2 |
| 1 | 1736337715 | 2 |
| 1 | 1736337750 | 2 |
| 2 | 1736337760 | 3 |
| 3 | 1736337820 | 4 |
| 3 | 1736337840 | 4 |
| 3 | 1736337850 | 4 |
| 3 | 1736337910 | 4 |
| 4 | 1736337915 | 5 |
+----------+-------------+------------+

3.查询sum_order值个数>3的用户

执行SQL

select user_id,
sum_order,
count(1)
from (select user_id,
click_time,
sum(is_same_user) over (order by click_time asc) as sum_order
from (select user_id,
click_time,
case
when lag(user_id) over (order by click_time asc) is null then 0
when user_id = lag(user_id) over (order by click_time asc) then 0
else 1 end as is_same_user
from t2_click_log) t) tt
group by user_id, sum_order
having count(1) >= 3

查询结果

+----------+------------+------+
| user_id | sum_order | _c2 |
+----------+------------+------+
| 1 | 2 | 3 |
| 3 | 4 | 4 |
+----------+------------+------+

4.查询最终用户数量

执行SQL

select count(distinct user_id) as user_num
from (select user_id,
sum_order,
count(1)
from (select user_id,
click_time,
sum(is_same_user) over (order by click_time asc) as sum_order
from (select user_id,
click_time,
case
when lag(user_id) over (order by click_time asc) is null then 0
when user_id = lag(user_id) over (order by click_time asc) then 0
else 1 end as is_same_user
from t2_click_log) t) tt
group by user_id, sum_order
having count(1) >= 3) ttt

查询结果

+-----------+
| user_num |
+-----------+
| 2 |
+-----------+

3.2 双重排序差值法

1.分别按照时间,按照不分组和按照用户分组进行排序;

执行SQL

select user_id,
click_time,
row_number() over (order by click_time asc) as row_num1,
row_number() over (partition by user_id order by click_time asc) as row_num2
from t2_click_log

查询结果

+----------+-------------+-----------+-----------+
| user_id | click_time | row_num1 | row_num2 |
+----------+-------------+-----------+-----------+
| 1 | 1736337600 | 1 | 1 |
| 1 | 1736337710 | 3 | 2 |
| 1 | 1736337715 | 4 | 3 |
| 1 | 1736337750 | 5 | 4 |
| 2 | 1736337670 | 2 | 1 |
| 2 | 1736337760 | 6 | 2 |
| 3 | 1736337820 | 7 | 1 |
| 3 | 1736337840 | 8 | 2 |
| 3 | 1736337850 | 9 | 3 |
| 3 | 1736337910 | 10 | 4 |
| 4 | 1736337915 | 11 | 1 |
+----------+-------------+-----------+-----------+

2.计算差值并按照用户和差值进行分组

执行SQL

select user_id,
diff,
count(1) as aa
from (select user_id,
click_time,
row_number() over (order by click_time asc) -
row_number() over (partition by user_id order by click_time asc) as diff
from t2_click_log) t
group by user_id, diff

查询结果

+----------+-------+-----+
| user_id | diff | aa |
+----------+-------+-----+
| 1 | 0 | 1 |
| 1 | 1 | 3 |
| 2 | 1 | 1 |
| 2 | 4 | 1 |
| 3 | 6 | 4 |
| 4 | 10 | 1 |
+----------+-------+-----+

3.查询分组行数>=3的用户差值分组

执行SQL

select user_id,
diff
from (select user_id,
diff,
count(1) aa
from (select user_id,
click_time,
row_number() over (order by click_time asc) -
row_number() over (partition by user_id order by click_time asc) as diff
from t2_click_log) t
group by user_id, diff) tt
where aa >= 3

查询结果

+----------+-------+
| user_id | diff |
+----------+-------+
| 1 | 1 |
| 3 | 6 |
+----------+-------+

4.计算用户数

执行SQL

select count(distinct user_id) as user_num
from (select user_id,
diff,
count(1) aa
from (select user_id,
click_time,
row_number() over (order by click_time asc) -
row_number() over (partition by user_id order by click_time asc) as diff
from t2_click_log) t
group by user_id, diff) tt
where aa >= 3

查询结果

+-----------+
| user_num |
+-----------+
| 2 |
+-----------+

四、建表语句和数据插入

--建表语句
CREATE TABLE t2_click_log (
user_id BIGINT,
click_time BIGINT
);
--插入数据
insert into t2_click_log (user_id,click_time)
values
(1,1736337600),
(2,1736337670),
(1,1736337710),
(1,1736337715),
(1,1736337750),
(2,1736337760),
(3,1736337820),
(3,1736337840),
(3,1736337850),
(3,1736337910),
(4,1736337915)