计算机 · 2021年8月12日 0

Hackerrank刷题笔记之SQL

  1. 查询以aeiou字母开头的城市
   SELECT DISTINCT(CITY) FROM STATION
   WHERE CITY REGEXP '^[aeiou].*'

主要是记得使用REGEXP函数,和正则表达式要用^表示开头。
或者:

   select distinct city from station 
   where left(city,1) in ('a','e','i','o','u')

left函数提取子字符串。

  1. 基本的join操作
    使用join写两个语句,分别把长度最长和最短的给找出来。注意的是mysql不支持在JOIN ON语法中使用前面用AS设置的别名(但是clickhouse就可以)。排序后使用TOP或者LIMIT语法都可以。
    顺便想起一个问题,那就是TOPLIMIT有没有区别呢,网上搜了一下,发现的唯一区别是LIMIT可以再指定一个offset,这样就可以取查询结果中间的一段值了。
  2. int与string的转换
    使用cast在数字和字符串之间来回转换,需要注意的是cast转换的目标类型是几个固定死了的类型,所谓的intvarchar不在转换目标类型之列。
    另外就是使用REPLACE函数把0给替换成空字符串。
  3. 打印三角形
    关键在于使用SET设置变量和使用REPEAT函数。
    以及一个充当占位用的information_schema.tables
  4. The PADS
    就是练习一下concatlower函数。
  5. Occupations
    这道题涉及一个pivot的概念(其实不懂这个概念也没啥关系),就是我们一般的查询是用这个表建表时使用的字段作为列,而pivot呢是说我们用其中某个字段的枚举值来作为列。
    在mssql中是有pivot函数的,而在mysql中则是没有pivot函数的,需要自己实现一个弱智版本的pivot(因为需要提前知道字段的所有值来写case语句枚举)。
    来自于讨论区的mysql答案:
   SELECT 
   MIN(CASE WHEN Occupation = 'Doctor' THEN Name ELSE NULL END) AS Doctor,
   MIN(CASE WHEN Occupation = 'Professor' THEN Name ELSE NULL END) AS Professor,
   MIN(CASE WHEN Occupation = 'Singer' THEN Name ELSE NULL END) AS Singer,
   MIN(CASE WHEN Occupation = 'Actor' THEN Name ELSE NULL END) AS Actor
   FROM (
     SELECT a.Occupation,
            a.Name,
            (SELECT COUNT(*) 
               FROM Occupations AS b
               WHERE a.Occupation = b.Occupation AND a.Name > b.Name) AS myrank
     FROM Occupations AS a
   ) AS c
   GROUP BY c.myrank
   ORDER BY c.myrank

这里需要注意用子查询作为列的这种写法,不是看到人家这么写我都不知道子查询还可以作为select的一列。
除了用子查询,也可以用rank over函数来实现计算不同职业的人的姓名排名:

   SELECT
   MIN(CASE WHEN Occupation = 'Doctor' THEN Name ELSE NULL END) AS Doctor,
   MIN(CASE WHEN Occupation = 'Professor' THEN Name ELSE NULL END) AS Professor,
   MIN(CASE WHEN Occupation = 'Singer' THEN Name ELSE NULL END) AS Singer,
   MIN(CASE WHEN Occupation = 'Actor' THEN Name ELSE NULL END) AS Actor
   FROM
   (
       SELECT Occupation, Name, rank() over( partition by Occupation order by Name) as myrank
       FROM Occupations
   )t
   group by myrank
   order by myrank

用rank over函数需要注意的点是mysql要从8.0开始才有这种window函数。如果是8.0之前的版本你要获得这个排名需要join表来计算(或者用之前的子查询方式),简直是麻烦死了。
rank over函数简介
顺便说一下mysql没有full join,坑。

  1. Manhattan Distance
    就是练习一下round函数和abs函数。
    另外也有题目联系sqrt和power函数的,就懒得列了。
  2. Median Query
    在可以使用rank over和subquery的情况下非常好解决:
   with t as
   (
   select LAT_N, rank() over (order by LAT_N ) as place
   from station
   )
   select round(AVG(LAT_N), 4)
   from t
   where abs(place - ((select count(*) from t) + 1)/2.0) <= 0.6 

如果没有rank over,那么就只能玩join的苦逼套路了:用join的方式来算rank,然后再找出中位数对应的那一行或者两行去算平均值。
这里算便记录一下,mysql的subquery(或者叫inner query, inner select)可以放在WHERE、FROM和SELECT子句里。
以及rank over函数是不能用在where子句里的。

  1. Prime Number
    注意每一个生成的中间表都需要取别名:Every derived table must have its own alias
    在mysql语法中,where子句和group by子句不能用列的别名,而order by子句可以用列的别名。
    可以概括为练习了一下group_concatmod两个函数的使用。同时加深了对生成这个1到1000范围的数列的sql的理解(那就是对表中的每一行执行tmp加1的操作而不是我原先理解的这个information_schmea.tables是没用的,是拿来充数的)。
   set @tmp:=0;
   with t1 as (
       select @tmp:=@tmp+1 as n
       from information_schema.tables, information_schema.tables c
       limit 1000
   ) 
   select group_concat(t3.n SEPARATOR '&')
   from(
       select n, (select count(1) from t1 as t2 where t2.n <= t1.n and mod(t1.n, t2.n) = 0) as divisor
       from t1 
       where (select count(1) from t1 as t2 where t2.n <= t1.n and mod(t1.n, t2.n) = 0) = 2
       order by t1.n
   )t3

10. Multiple Join
帮助你练习不停的join。

SELECT Company.company_code, Company.founder, COUNT(DISTINCT Lead_manager.lead_manager_code)
	, COUNT(DISTINCT Senior_manager.senior_manager_code), COUNT(DISTINCT Manager.manager_code)
	, COUNT(DISTINCT Employee.employee_code)
FROM Company
	LEFT JOIN Lead_manager ON Company.company_code = Lead_manager.company_code
	LEFT JOIN Senior_manager ON Lead_manager.lead_manager_code = Senior_manager.lead_manager_code
	LEFT JOIN Manager ON Senior_manager.senior_manager_code = Manager.senior_manager_code
	LEFT JOIN Employee ON Manager.manager_code = Employee.manager_code
GROUP BY Company.company_code, Company.founder

11. Project Planning
问题大意就是合并长度相同的连续区间。
解法一:使用变量来标记属于同一个任务的子任务,同时还用到了LAG函数。

SET @tmp = 0;

SELECT MIN(Start_Date) AS st, MAX(End_Date) AS et
FROM (
	SELECT Start_Date, End_Date
		, if(Start_Date <> Previous_End_Date, @tmp := @tmp + 1, @tmp) AS groupid
	FROM (
		SELECT Start_Date, End_Date, LAG(End_Date) OVER (ORDER BY Start_Date) AS Previous_End_Date
		FROM Projects
		ORDER BY Start_Date
	) t1
) t2
GROUP BY groupid
ORDER BY MAX(End_Date) - MIN(Start_Date), st

解法二:使用subquery,具体原理查看该问题讨论区。

SELECT Start_Date, MIN(End_Date)
FROM (
	SELECT Start_Date
	FROM Projects
	WHERE Start_Date NOT IN (
		SELECT End_Date
		FROM Projects
	)
) a, (
		SELECT end_date
		FROM PROJECTS
		WHERE end_date NOT IN (
			SELECT start_date
			FROM PROJECTS
		)
	) b
WHERE start_date < end_date
GROUP BY start_date
ORDER BY datediff(start_date, MIN(end_date)) DESC, start_date

解法三:由于子区间的长度相同,其实问题等价于在一个排好序的序列中,找出构成这个序列的连续子序列。只要用row_number和每个元素做差,然后对这个差group by就可以区分属于不同连续子序列的元素了。

;WITH CTE_Projects AS ( SELECT Start_Date, End_Date, ROW_NUMBER() OVER (ORDER BY Start_Date) AS RowNumber FROM Projects WITH (NOLOCK) ) SELECT MIN(Start_Date), MAX(End_Date) FROM CTE_Projects WITH (NOLOCK) GROUP BY DATEDIFF(DAY, RowNumber, Start_Date) ORDER BY DATEDIFF(DAY, MIN(Start_Date), MAX(End_Date)), MIN(Start_Date) 

这个问题更难的一个版本,给出若干个区间,求这些区间合并后的结果。在stackoverflow上搜到一个clickhouse版本的答案,自己也想过,没想出来。

12. Having
练习一下having,where是在聚合前进行筛选,而having是对聚合后的结果进行筛选。

SELECT hackers.hacker_id, hackers.name
FROM hackers
	JOIN submissions ON hackers.hacker_id = submissions.hacker_id
	JOIN challenges ON submissions.challenge_id = challenges.challenge_id
	JOIN difficulty ON challenges.difficulty_level = difficulty.difficulty_level
WHERE submissions.score = difficulty.score
GROUP BY hackers.hacker_id, hackers.name
HAVING COUNT(1) > 1
ORDER BY COUNT(1) DESC, hackers.hacker_id

13. Challenges
大意是要同时计算出某一列的最大值,和每一行的这一列的这个值的排名。hackerrank这道题给的mysql版本是一个很低版本的,没有with语句,还老是报各种语法错,被这个低版本mysql语法给搞死了。最后反正是看评论区,用子查询写的(而且还不能使用表别名那种子查询,需要自己重新把相应的表生成一遍,无语):

SELECT hacker_id, name, challenge_count
FROM (
	SELECT hackers.hacker_id AS hacker_id, name, COUNT(1) AS challenge_count
	FROM hackers
		JOIN challenges ON hackers.hacker_id = challenges.hacker_id
	GROUP BY hacker_id, name
	ORDER BY challenge_count DESC, hacker_id
) t
WHERE (
		SELECT MAX(cnt)
		FROM (
			SELECT COUNT(1) AS cnt
			FROM challenges
			GROUP BY hacker_id
		) tmp
	) = challenge_count
	OR (
		SELECT COUNT(1)
		FROM (
			SELECT COUNT(1) AS cnt
			FROM challenges
			GROUP BY hacker_id
		) tmp
		WHERE cnt = challenge_count
	) = 1

14. Wands
丑陋的subquery:

SELECT id, age, coins, wpower
FROM (
	SELECT wands.id AS id, wands_property.age AS age, wands.coins_needed AS coins, wands.power AS wpower
	FROM wands
		JOIN wands_property ON wands.code = wands_property.code
	WHERE wands_property.is_evil = 0
) t
WHERE (
	SELECT COUNT(1)
	FROM (
		SELECT wands.id AS id, wands_property.age AS age, wands.coins_needed AS coins, wands.power AS wpower
		FROM wands
			JOIN wands_property ON wands.code = wands_property.code
		WHERE wands_property.is_evil = 0
	) t1
	WHERE t1.age = t.age
		AND t1.wpower = t.wpower
		AND t1.coins < t.coins
) = 0
ORDER BY wpower DESC, age DESC

用row_number对多个列排序时居然会出错,貌似是因为评判系统的mysql版本太低了导致的。

15. Interviews
题目技术上其实比较简单,就是题目理解容易 出错:

View_Stats和Submission_Stats这两个表里面的total_views、total_unique_views、total_submissions和total_accepted_submissions都需要再求一次和;

需要left join,有些题目可能在View_Stats和Submission_Stats中没有出现;

16. Days of Learning SQL
虽然不停地写子表套子表可以套出来,可是还是受不了这种折磨看了讨论区的答案,然后发现自己还是没有真正理解mysql的语法。。。

SELECT submission_date
	, (
		SELECT COUNT(DISTINCT hacker_id)
		FROM submissions s2
		WHERE (
			SELECT COUNT(DISTINCT s3.submission_date)
			FROM submissions s3
			WHERE s3.submission_date <= s1.submission_date
				AND s3.hacker_id = s2.hacker_id
		) = datediff(s1.submission_date, '2016-03-01') + 1
	)
	, (
		SELECT hacker_id
		FROM submissions s2
		WHERE s2.submission_date = s1.submission_date
		GROUP BY hacker_id
		ORDER BY COUNT(1) DESC, hacker_id
		LIMIT 1
	) AS mvp
	, (
		SELECT name
		FROM hackers
		WHERE hacker_id = mvp
	)
FROM (
	SELECT submission_date
	FROM submissions
	GROUP BY submission_date
	ORDER BY submission_date
) s1


主要学到的就是几点吧:

用子查询替代join(不停的join真的很痛苦);

前面一个算出来的列的值可以用来计算后面的列;

from的子查询查出的表可以用到select语句里;

不同的子查询里可以创建新的相同的表;

都做到最后一题了,发现自己还是sql入门水平。。。