SELECT DISTINCT(CITY) FROM STATION
WHERE CITY REGEXP '^[aeiou].*'
主要是记得使用REGEXP函数,和正则表达式要用^表示开头。
或者:
select distinct city from station
where left(city,1) in ('a','e','i','o','u')
用left函数提取子字符串。
- 基本的join操作
使用join写两个语句,分别把长度最长和最短的给找出来。注意的是mysql不支持在JOIN ON
语法中使用前面用AS
设置的别名(但是clickhouse就可以)。排序后使用TOP
或者LIMIT
语法都可以。
顺便想起一个问题,那就是TOP
和LIMIT
有没有区别呢,网上搜了一下,发现的唯一区别是LIMIT
可以再指定一个offset,这样就可以取查询结果中间的一段值了。 - int与string的转换
使用cast在数字和字符串之间来回转换,需要注意的是cast转换的目标类型是几个固定死了的类型,所谓的int和varchar不在转换目标类型之列。
另外就是使用REPLACE函数把0给替换成空字符串。 - 打印三角形
关键在于使用SET设置变量和使用REPEAT函数。
以及一个充当占位用的information_schema.tables。 - The PADS
就是练习一下concat与lower函数。 - Occupations
这道题涉及一个pivot的概念(其实不懂这个概念也没啥关系),就是我们一般的查询是用这个表建表时使用的字段作为列,而pivot呢是说我们用其中某个字段的枚举值来作为列。
在mssql中是有pivot函数的,而在mysql中则是没有pivot函数的,需要自己实现一个弱智版本的pivot(因为需要提前知道字段的所有值来写case语句枚举)。
来自于讨论区的mysql答案:
SELECT
MIN(CASE WHEN Occupation = 'Doctor' THEN Name ELSE NULL END) AS Doctor,
MIN(CASE WHEN Occupation = 'Professor' THEN Name ELSE NULL END) AS Professor,
MIN(CASE WHEN Occupation = 'Singer' THEN Name ELSE NULL END) AS Singer,
MIN(CASE WHEN Occupation = 'Actor' THEN Name ELSE NULL END) AS Actor
FROM (
SELECT a.Occupation,
a.Name,
(SELECT COUNT(*)
FROM Occupations AS b
WHERE a.Occupation = b.Occupation AND a.Name > b.Name) AS myrank
FROM Occupations AS a
) AS c
GROUP BY c.myrank
ORDER BY c.myrank
这里需要注意用子查询作为列的这种写法,不是看到人家这么写我都不知道子查询还可以作为select的一列。
除了用子查询,也可以用rank over函数来实现计算不同职业的人的姓名排名:
SELECT
MIN(CASE WHEN Occupation = 'Doctor' THEN Name ELSE NULL END) AS Doctor,
MIN(CASE WHEN Occupation = 'Professor' THEN Name ELSE NULL END) AS Professor,
MIN(CASE WHEN Occupation = 'Singer' THEN Name ELSE NULL END) AS Singer,
MIN(CASE WHEN Occupation = 'Actor' THEN Name ELSE NULL END) AS Actor
FROM
(
SELECT Occupation, Name, rank() over( partition by Occupation order by Name) as myrank
FROM Occupations
)t
group by myrank
order by myrank
用rank over函数需要注意的点是mysql要从8.0开始才有这种window函数。如果是8.0之前的版本你要获得这个排名需要join表来计算(或者用之前的子查询方式),简直是麻烦死了。
rank over函数简介。
顺便说一下mysql没有full join,坑。
- Manhattan Distance
就是练习一下round函数和abs函数。
另外也有题目联系sqrt和power函数的,就懒得列了。 - Median Query
在可以使用rank over和subquery的情况下非常好解决:
with t as
(
select LAT_N, rank() over (order by LAT_N ) as place
from station
)
select round(AVG(LAT_N), 4)
from t
where abs(place - ((select count(*) from t) + 1)/2.0) <= 0.6
如果没有rank over,那么就只能玩join的苦逼套路了:用join的方式来算rank,然后再找出中位数对应的那一行或者两行去算平均值。
这里算便记录一下,mysql的subquery(或者叫inner query, inner select)可以放在WHERE、FROM和SELECT子句里。
以及rank over函数是不能用在where子句里的。
- Prime Number
注意每一个生成的中间表都需要取别名:Every derived table must have its own alias。
在mysql语法中,where子句和group by子句不能用列的别名,而order by子句可以用列的别名。
可以概括为练习了一下group_concat和mod两个函数的使用。同时加深了对生成这个1到1000范围的数列的sql的理解(那就是对表中的每一行执行tmp加1的操作而不是我原先理解的这个information_schmea.tables是没用的,是拿来充数的)。
set @tmp:=0;
with t1 as (
select @tmp:=@tmp+1 as n
from information_schema.tables, information_schema.tables c
limit 1000
)
select group_concat(t3.n SEPARATOR '&')
from(
select n, (select count(1) from t1 as t2 where t2.n <= t1.n and mod(t1.n, t2.n) = 0) as divisor
from t1
where (select count(1) from t1 as t2 where t2.n <= t1.n and mod(t1.n, t2.n) = 0) = 2
order by t1.n
)t3
10. Multiple Join
帮助你练习不停的join。
SELECT Company.company_code, Company.founder, COUNT(DISTINCT Lead_manager.lead_manager_code)
, COUNT(DISTINCT Senior_manager.senior_manager_code), COUNT(DISTINCT Manager.manager_code)
, COUNT(DISTINCT Employee.employee_code)
FROM Company
LEFT JOIN Lead_manager ON Company.company_code = Lead_manager.company_code
LEFT JOIN Senior_manager ON Lead_manager.lead_manager_code = Senior_manager.lead_manager_code
LEFT JOIN Manager ON Senior_manager.senior_manager_code = Manager.senior_manager_code
LEFT JOIN Employee ON Manager.manager_code = Employee.manager_code
GROUP BY Company.company_code, Company.founder
11. Project Planning
问题大意就是合并长度相同的连续区间。
解法一:使用变量来标记属于同一个任务的子任务,同时还用到了LAG函数。
SET @tmp = 0;
SELECT MIN(Start_Date) AS st, MAX(End_Date) AS et
FROM (
SELECT Start_Date, End_Date
, if(Start_Date <> Previous_End_Date, @tmp := @tmp + 1, @tmp) AS groupid
FROM (
SELECT Start_Date, End_Date, LAG(End_Date) OVER (ORDER BY Start_Date) AS Previous_End_Date
FROM Projects
ORDER BY Start_Date
) t1
) t2
GROUP BY groupid
ORDER BY MAX(End_Date) - MIN(Start_Date), st
解法二:使用subquery,具体原理查看该问题讨论区。
SELECT Start_Date, MIN(End_Date)
FROM (
SELECT Start_Date
FROM Projects
WHERE Start_Date NOT IN (
SELECT End_Date
FROM Projects
)
) a, (
SELECT end_date
FROM PROJECTS
WHERE end_date NOT IN (
SELECT start_date
FROM PROJECTS
)
) b
WHERE start_date < end_date
GROUP BY start_date
ORDER BY datediff(start_date, MIN(end_date)) DESC, start_date
解法三:由于子区间的长度相同,其实问题等价于在一个排好序的序列中,找出构成这个序列的连续子序列。只要用row_number和每个元素做差,然后对这个差group by就可以区分属于不同连续子序列的元素了。
;WITH CTE_Projects AS ( SELECT Start_Date, End_Date, ROW_NUMBER() OVER (ORDER BY Start_Date) AS RowNumber FROM Projects WITH (NOLOCK) ) SELECT MIN(Start_Date), MAX(End_Date) FROM CTE_Projects WITH (NOLOCK) GROUP BY DATEDIFF(DAY, RowNumber, Start_Date) ORDER BY DATEDIFF(DAY, MIN(Start_Date), MAX(End_Date)), MIN(Start_Date)
这个问题更难的一个版本,给出若干个区间,求这些区间合并后的结果。在stackoverflow上搜到一个clickhouse版本的答案,自己也想过,没想出来。
12. Having
练习一下having,where是在聚合前进行筛选,而having是对聚合后的结果进行筛选。
SELECT hackers.hacker_id, hackers.name
FROM hackers
JOIN submissions ON hackers.hacker_id = submissions.hacker_id
JOIN challenges ON submissions.challenge_id = challenges.challenge_id
JOIN difficulty ON challenges.difficulty_level = difficulty.difficulty_level
WHERE submissions.score = difficulty.score
GROUP BY hackers.hacker_id, hackers.name
HAVING COUNT(1) > 1
ORDER BY COUNT(1) DESC, hackers.hacker_id
13. Challenges
大意是要同时计算出某一列的最大值,和每一行的这一列的这个值的排名。hackerrank这道题给的mysql版本是一个很低版本的,没有with语句,还老是报各种语法错,被这个低版本mysql语法给搞死了。最后反正是看评论区,用子查询写的(而且还不能使用表别名那种子查询,需要自己重新把相应的表生成一遍,无语):
SELECT hacker_id, name, challenge_count
FROM (
SELECT hackers.hacker_id AS hacker_id, name, COUNT(1) AS challenge_count
FROM hackers
JOIN challenges ON hackers.hacker_id = challenges.hacker_id
GROUP BY hacker_id, name
ORDER BY challenge_count DESC, hacker_id
) t
WHERE (
SELECT MAX(cnt)
FROM (
SELECT COUNT(1) AS cnt
FROM challenges
GROUP BY hacker_id
) tmp
) = challenge_count
OR (
SELECT COUNT(1)
FROM (
SELECT COUNT(1) AS cnt
FROM challenges
GROUP BY hacker_id
) tmp
WHERE cnt = challenge_count
) = 1
14. Wands
丑陋的subquery:
SELECT id, age, coins, wpower
FROM (
SELECT wands.id AS id, wands_property.age AS age, wands.coins_needed AS coins, wands.power AS wpower
FROM wands
JOIN wands_property ON wands.code = wands_property.code
WHERE wands_property.is_evil = 0
) t
WHERE (
SELECT COUNT(1)
FROM (
SELECT wands.id AS id, wands_property.age AS age, wands.coins_needed AS coins, wands.power AS wpower
FROM wands
JOIN wands_property ON wands.code = wands_property.code
WHERE wands_property.is_evil = 0
) t1
WHERE t1.age = t.age
AND t1.wpower = t.wpower
AND t1.coins < t.coins
) = 0
ORDER BY wpower DESC, age DESC
用row_number对多个列排序时居然会出错,貌似是因为评判系统的mysql版本太低了导致的。
15. Interviews
题目技术上其实比较简单,就是题目理解容易 出错:
View_Stats和Submission_Stats这两个表里面的total_views、total_unique_views、total_submissions和total_accepted_submissions都需要再求一次和;
需要left join,有些题目可能在View_Stats和Submission_Stats中没有出现;
16. Days of Learning SQL
虽然不停地写子表套子表可以套出来,可是还是受不了这种折磨看了讨论区的答案,然后发现自己还是没有真正理解mysql的语法。。。
SELECT submission_date
, (
SELECT COUNT(DISTINCT hacker_id)
FROM submissions s2
WHERE (
SELECT COUNT(DISTINCT s3.submission_date)
FROM submissions s3
WHERE s3.submission_date <= s1.submission_date
AND s3.hacker_id = s2.hacker_id
) = datediff(s1.submission_date, '2016-03-01') + 1
)
, (
SELECT hacker_id
FROM submissions s2
WHERE s2.submission_date = s1.submission_date
GROUP BY hacker_id
ORDER BY COUNT(1) DESC, hacker_id
LIMIT 1
) AS mvp
, (
SELECT name
FROM hackers
WHERE hacker_id = mvp
)
FROM (
SELECT submission_date
FROM submissions
GROUP BY submission_date
ORDER BY submission_date
) s1
主要学到的就是几点吧:
用子查询替代join(不停的join真的很痛苦);
前面一个算出来的列的值可以用来计算后面的列;
from的子查询查出的表可以用到select语句里;
不同的子查询里可以创建新的相同的表;
都做到最后一题了,发现自己还是sql入门水平。。。
近期评论