Cache of the Internet.

主宰生活，而不是被生活主宰。

计算机 · 2021年12月29日 0

write better sql

Table of Contents

可以优化的点

相同的表达式不要写多遍
位运算写的更加明白一点

hive上的优化

with cube的使用
https://tech.youzan.com/cube/
其实就是把group by列出来的维度的可能组合都group by了一遍

ck上的优化

使用with语句避免一个表达式写多遍
用hex函数打印数字的16进制形式

FAQ

获取array的长度
- hive
  size(col)
- ck
获取string的长度
- hive
- ck
  length(col)
获取连续的区间
用行号和想要连续区间的列相减，然后根据group by的结果再取最大值、最小值就得到了连续区间
这篇csdn博客也可以提供一定的参考。
合并区间

SELECT
    intervals,

    arraySort(x -> x, intervals) sortedIntervals,

    /* try to merge each interval with precede ones */
    arrayMap((x, index) -> index != 1
        ? (arrayReduce(
            'min', 
            arrayMap(
              i -> sortedIntervals[i + 1].1, 
              /* get indexes of intervals that can be merged with the current one (index is zero-based) */              
              arrayFilter(
                i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1, 
                range(index)))),
          arrayReduce(
            'max', 
            arrayMap(
              i -> sortedIntervals[i + 1].2,  
              /* get indexes of intervals that can be merged with the current one (index is zero-based) */              
              arrayFilter(
                i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1, 
                range(index)))))
        : x,
      sortedIntervals, 
      arrayEnumerate(sortedIntervals)) rawResult,

    /* filter out intervals nested to other ones */
    arrayFilter(
      (x, index) -> index == length(rawResult) OR x.1 != rawResult[index + 1].1,
      rawResult, 
      arrayEnumerate(rawResult)) result
FROM
(
    SELECT [(1, 5), (2, 3), (3, 8), (10, 15)] intervals
    UNION ALL
    SELECT [(2, 4), (1, 3), (3, 6), (12, 14), (7, 7), (13, 16), (9, 9), (8, 9), (10, 15)]
    UNION ALL
    SELECT [(20, 22), (18, 18), (16, 21), (1, 8), (2, 9), (3, 5), (10, 12), (11, 13), (14, 15)]
    UNION ALL
    SELECT []
    UNION ALL 
    SELECT [(1, 11)]
)

容易忽视的地方
- null值的处理
  - 计算avg时，null值的处理
  - join表或者lateral view explode或者array join时null值的处理

标签： sql

发表回复取消回复