Fork me on GitHub

SQL面试-窗口函数

大厂SQL面试必考点

一个数据工作者面试数据相关岗位,SQL查询语句是必不可少的笔试环节,本文记录大厂们的一些SQL常考面试题,主要还是涉及到窗口函数和case when的用法。!

SQL面试题

  1. 找出连续7天登陆,连续30天登陆的用户(小红书笔试,电信云面试),最大连续登陆天数的问题 --窗口函数
  2. 求连续点击三次的用户数,中间不能有别人的点击 ,最大连续天数的变形问题(腾讯微保面试)–窗口函数
  3. 计算除去部门最高工资,和最低工资的平均工资(字节跳动面试)–窗口函数
  4. 留存的计算,和累计求和的计算 --窗口函数,自联结(pdd面试)
  5. AB球队得分流水表,得到连续三次得分的队员名字 和每次赶超对手的球员名字,(pdd面试)

把这几类题型吃透,再也不怕手撕SQL和笔试了,其中最难的是题(5),整个面试的sql基本上都是窗口函数的玩法,搭配case when 也考得比较多。

(1) 找出连续7天登陆,连续30天登陆的用户

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select
*
fromselect
user_id,
count(1) as num
from (select
user_id,
date_sub(log_in_date, rank) dts
from (select
user_id,
log_in_date,
row_number() over(partitioned by user_id order by log_in_date ) as rank
from user_log
)t
)a
group by dts
)b
where num = 7

(2) 求连续点击三次的用户数,而且中间不能有别人的点击

a表记录了点击的流水信息,包括用户id ,和点击时间

1
2
3
usr_id a a b a a a a

click_time t1 t2 t3 t4 t5 t6 t7

row_number() over(order by click_time) as rank_1 得到rank_1为 1 2 3 4 5 6 7

row_number() over(partition by usr_id order by click_time) 得到rank_2 为 1 2 1 3 4 5 6

rank_1- rank2 得到diff 为 0 0 2 1 1 1 1

这时我们发现只需要对diff进行分组计数大于3个,就是连续点击大于三且中间没有其他人点击的用户。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
select
distinct usr_id
from (select
*,
rank_1 - rank2 as diff
from(select
*,
row_number() over(order by click_time) as rank_1
row_number() over(partition by usr_id order by click_time) as rank_2
from a
) b
) c
group by diff,usr_id
having count(diff) >= 3

(3) 计算除去部门最高工资,和最低工资的平均工资(字节跳动面试)–窗口函数

emp 表
1
id 员工 id ,deptno 部门编号,salary 工资

核心是使用窗口函数降序和升序分别排一遍就取出了最高和最低。

1
2
3
4
5
6
7
8
9
10
11
select
a.deptno,
avg(a.salary)
from (select
*,
rank() over(partition by deptno order by salary ) as rank_1,
rank() over(partition by deptno order by salary desc) as rank_2
from emp
)a
group by a.deptno
where a.rank_1 >1 and a.rank_2 >1

(4) 留存的计算,和累计求和的计算 --窗口函数,自联结(pdd面试)

手机中的相机是深受大家喜爱的应用之一,下图是某手机厂商数据库中的用户行为信息表中部分数据的截图

现在该手机厂商想要分析手机中的**「应用(相机)的活跃情况,」**需统计如下数据:

需要获得的数据的格式如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
select d.a_t,count(distinct case when d.时间间隔=1 then d.用户id
else null
end) as 次日留存数,
count(distinct case when 时间间隔=1 then d.用户id
else null
end) /count(distinct d.用户id) as 次日留存率,
count(distinct case when d.时间间隔=3 then d.用户id
else null
end) as 3日留存数 ,
count(distinct case when 时间间隔=3 then d.用户id
else null
end) /count(distinct d.用户id) as 3日留存率,
count(distinct case when d.时间间隔=7 then d.用户id
else null
end) as 7日留存数 ,
count(distinct case when 时间间隔=7 then d.用户id
else null
end) /count(distinct d.用户id) as 7日留存率

from(select
*,
timestampdiff(day,a_t,b_t) as 时间间隔
from (select
a.`用户id`,
a.登陆时间 as a_t,
b.登陆时间 as b_t
from 登录信息 as a
left join 登录信息 as b on a.`用户id`=b.`用户id`
where a.应用名称= '相机' and b.应用名称='相机') as c
) as d
group by d.a_t;

(5) AB球队得分流水表,得到连续三次得分的队员名字 和每次赶超对手的球员名字(pdd)

在复盘时发现有类似原题,这是我在面试中遇到的最难的题

问题:两支篮球队进行了激烈的篮球比赛,比分交替上升。比赛结束后,你有一张两队得分分数的明细表,记录了:

  1. 球队team
  2. 球员号码number
  3. 球员姓名name
  4. 得分分数score
  5. 得分时间scoretime(datetime)

现在球队要对比赛中表现突出的球员做出嘉奖,所以请你用sql统计出:

  • 连续三次(及以上)为球队得分的球员名单
  • 比赛中帮助各自球队反超比分的球员姓名以及对应时间。

先建一个类似的表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
CREATE TABLE basketball_game_score_detail(
team VARCHAR(40) NOT NULL ,
number VARCHAR(100) NOT NULL,
score_time datetime NOT NULL,
score int NOT NULL,
name varchar(100) NOT NULL
);
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:01:14',1,'A1');
insert into basketball_game_score_detail values('A',5,'2020/8/28 9:02:28',1,'A5');
insert into basketball_game_score_detail values('B',4,'2020/8/28 9:03:42',3,'B4');
insert into basketball_game_score_detail values('A',4,'2020/8/28 9:04:55',3,'A4');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:06:09',3,'B1');
insert into basketball_game_score_detail values('A',3,'2020/8/28 9:07:23',3,'A3');
insert into basketball_game_score_detail values('A',4,'2020/8/28 9:08:37',3,'A4');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:09:51',2,'B1');
insert into basketball_game_score_detail values('B',2,'2020/8/28 9:11:05',2,'B2');
insert into basketball_game_score_detail values('B',4,'2020/8/28 9:12:18',1,'B4');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:13:32',2,'A1');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:14:46',1,'A1');
insert into basketball_game_score_detail values('A',4,'2020/8/28 9:16:00',1,'A4');
insert into basketball_game_score_detail values('B',3,'2020/8/28 9:17:14',3,'B3');
insert into basketball_game_score_detail values('B',2,'2020/8/28 9:18:28',3,'B2');
insert into basketball_game_score_detail values('A',2,'2020/8/28 9:19:42',3,'A2');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:20:55',1,'A1');
insert into basketball_game_score_detail values('B',3,'2020/8/28 9:22:09',2,'B3');
insert into basketball_game_score_detail values('B',3,'2020/8/28 9:23:23',3,'B3');
insert into basketball_game_score_detail values('A',5,'2020/8/28 9:24:37',2,'A5');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:25:51',3,'B1');
insert into basketball_game_score_detail values('B',2,'2020/8/28 9:27:05',1,'B2');
insert into basketball_game_score_detail values('A',3,'2020/8/28 9:28:18',1,'A3');
insert into basketball_game_score_detail values('B',4,'2020/8/28 9:29:32',1,'B4');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:30:46',3,'A1');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:32:00',1,'B1');
insert into basketball_game_score_detail values('A',4,'2020/8/28 9:33:14',2,'A4');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:34:28',1,'B1');
insert into basketball_game_score_detail values('B',5,'2020/8/28 9:35:42',2,'B5');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:36:55',1,'A1');
insert into basketball_game_score_detail values('B',1,'2020/8/28 9:38:09',3,'B1');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:39:23',3,'A1');
insert into basketball_game_score_detail values('B',2,'2020/8/28 9:40:37',3,'B2');
insert into basketball_game_score_detail values('A',3,'2020/8/28 9:41:51',3,'A3');
insert into basketball_game_score_detail values('A',1,'2020/8/28 9:43:05',2,'A1');
insert into basketball_game_score_detail values('B',3,'2020/8/28 9:44:18',3,'B3');
insert into basketball_game_score_detail values('A',5,'2020/8/28 9:45:32',2,'A5');
insert into basketball_game_score_detail values('B',5,'2020/8/28 9:46:46',3,'B5');

这里我使用了lead和lag来取每个组的前几个值,这个和最大联系天数不太一样,但也可以用类似思路去解,但是使用lead和lag做起来更容易理解。

1
2
3
4
5
6
7
8
9
10
11
12
13
select
distinct a.name,
a.team
from( select
*
lead(name,1) over(partition by team order by score_time) as ld1,
lead(name,2) over(partition by team order by score_time) as ld2,
lag(name,1) over(partition by team order by score_time) as lg1,
lag(name,2) over(partition by team order by score_time) as lg2
from table) a
where (a.name =a.ld1 and a.name =a.ld2)
or (a.name =a.ld1 and a.name =a.lg1)
or (a.name=a.lg1 and a.name=a.lg2)

第二小问面试时没完全做出来,说了下思路,现在想了想当时的思路还是有问题,而且这个题也并不难,核心还是记录每个时刻的累计得分表

1
2
3
4
5
6
7
8
9
10
select
team,
number,
name,
score_time,
score,
case when team='A' then score else 0 end as A_score,
case when team='B' then score else 0 end as B_score
from basketball_game_score_detail
order by SCORE_time

如下得到每个时刻的累计得分表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select
team,
number,
name,
score_time,
A_score,b_score,
sum(A_score)over(order by score_time) as a_sum_score2,
sum(b_score)over(order by score_time) as b_sum_score2
from (select
team,
number,
name,
score_time,
score,
case when team='A' then score else 0 end as A_score,
case when team='B' then score else 0 end as B_score
from basketball_game_score_detail
order by score_time) as x

计算每个时刻的累计得分差,和上个时间的累计得分差,只要两个的符号相反就是反超时刻。感觉思路还是比较简洁的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
select
*,
score_gap * last_score_gap
from (select
*,
a_sum_score2-b_sum_score2 as score_gap,
lag(a_sum_score2-b_sum_score2,1)over(order by score_time) as last_score_gap
from(select
team,
number,
name,
score_time,
A_score,
B_score,
sum(A_score) over (order by score_time) as a_sum_score2,
sum(B_score) over (order by score_time) as b_sum_score2
from (select
team,
number,
name,
score_time,
score,
case when team='A' then score else 0 end as A_score,
case when team='B' then score else 0 end as B_score
from basketball_game_score_detail
order by SCORE_time) as x
) as y
) as z
where z.score_gap * last_score_gap<=0
and a_sum_score2 <> b_sum_score2

来源:知乎绚丽的小海螺

个人习惯

聊下个人的SQL写作习惯,以上面的最后一个SQL代码为例,在下面的图中我分成了4个部分:

个人的SQL写作几点习惯:

  1. 关键词小写:标准的写法是关键词比如:SELECT /ORDER BY/FROM等都是要大写的;我一般全部都是小写
  2. 关键词换行:每个关键词单独成行
  3. select的如果是多个字段,请换行写出每个字段;每个字段的末尾跟上逗号,方便写注释
  4. 如果是出现了多个子查询,注意换行和嵌套的层级结构

本文标题:SQL面试-窗口函数

发布时间:2022年08月25日 - 23:08

原始链接:http://www.renpeter.cn/2022/08/25/SQL%E9%9D%A2%E8%AF%95-%E7%AA%97%E5%8F%A3%E5%87%BD%E6%95%B0.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

Coffee or Tea