Clickhouse-语法优化规则
[toc]
# Clickhouse-语法优化规则
ClickHouse 的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则
# 准备测试用表
# 1.上传官方的数据集
将 visits_v1.tar 和 hits_v1.tar 上传到虚拟机,解压到 clickhouse 数据路径下
// 解压到 clickhouse 数据路径
sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse
sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse
//修改所属用户
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
2
3
4
5
6
7
# 2.重启 clickhouse-server
clickhouse restart
# 3.执行查询
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
2
[root]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
8873898
[root]# clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
1676861
2
3
4
注意:
官方的 tar 包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。
hits_v1 表有 130 多个字段,880 多万条数据 visits_v1 表有 180 多个字段,160 多万条数据
# COUNT优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
EXPLAIN SELECT count()FROM datasets.hits_v1;
Union
Expression (Projection)
Expression (Before ORDER BY and SELECT)
MergingAggregated
ReadNothing (Optimized trivial count)
2
3
4
5
6
7
注意 Optimized trivial count ,这是对 count 的优化。
如果 count 具体的列字段,则不会使用此项优化:
EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1;
Union
Expression (Projection)
Expression (Before ORDER BY and SELECT)
Aggregating
Expression (Before GROUP BY)
ReadFromStorage (Read from MergeTree)
2
3
4
5
6
7
8
# 消除子查询重复字段
下面语句子查询中有两个重复的 id 字段,会被去重:
EXPLAIN SYNTAX
SELECT
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM hits_v1 AS a
LEFT JOIN
(
SELECT
UserID,
UserID AS HaHa,
VisitID
FROM visits_v1
) AS b USING (UserID)
LIMIT 3;
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
执行后返回优化语句
┌─explain───────────────┐
│ SELECT │
│ UserID, │
│ VisitID, │
│ URL, │
│ b.UserID │
│ FROM hits_v1 AS a │
│ ALL LEFT JOIN │
│ ( │
│ SELECT │
│ UserID, │
│ VisitID │
│ FROM visits_v1 │
│ ) AS b USING (UserID) │
│ LIMIT 3 │
└───────────────────────┘
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 谓词下推
当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。
例如下面的查询,HAVING name 变成了 WHERE name,在 group by 之前过滤:
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID =
'8585742290196126178';
# 返回优化语句
SELECT UserID
FROM hits_v1
WHERE UserID = \'8585742290196126178\'
GROUP BY UserID
2
3
4
5
6
7
8
子查询也支持谓词下推:
EXPLAIN SYNTAX
SELECT *
FROM
(
SELECT UserID
FROM visits_v1
)
WHERE UserID = '8585742290196126178';
# 返回优化语句
┌─explain──────────────────────────────────┐
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
└──────────────────────────────────────────┘
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
再来一个复杂例子:
EXPLAIN SYNTAX
SELECT *
FROM
(
SELECT *
FROM
(
SELECT UserID
FROM visits_v1
)
UNION ALL
SELECT *
FROM
(
SELECT UserID
FROM visits_v1
)
)
WHERE UserID = '8585742290196126178';
# 返回优化语句
┌─explain──────────────────────────────────────┐
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
│ UNION ALL │
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
└──────────────────────────────────────────────┘
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# 聚合外推
聚合函数内的计算,会外推,例如:
EXPLAIN SYNTAX
SELECT sum(UserID * 2)
FROM visits_v1
# 返回优化后的语句
SELECT sum(UserID) * 2
FROM visits_v1
2
3
4
5
6
7
# 聚合函数消除
如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除,
例如:
EXPLAIN SYNTAX
SELECT
sum(UserID * 2),
max(VisitID),
max(UserID)
FROM visits_v1
GROUP BY UserID;
# 返回优化后的语句
SELECT
sum(UserID) * 2,
max(VisitID),
UserID
FROM visits_v1
GROUP BY UserID;
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 删除重复的 order by key
例如下面的语句,重复的聚合键 id 字段会被去重:
EXPLAIN SYNTAX
SELECT *
FROM visits_v1
ORDER BY
UserID ASC,
UserID ASC,
VisitID ASC,
VisitID ASC
# 返回优化后的语句
select
……
FROM visits_v1
ORDER BY
UserID ASC,
VisitID ASC
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 删除重复的 limit by key
例如下面的语句,重复声明的 name 字段会被去重:
EXPLAIN SYNTAX
SELECT *
FROM visits_v1
LIMIT 3 BY
VisitID,
VisitID
LIMIT 10
# 返回优化后的语句:
select
……
FROM visits_v1
LIMIT 3 BY VisitID
LIMIT 10
2
3
4
5
6
7
8
9
10
11
12
13
14
Limit By 用法
LIMIT 3 BY $column_name
每个$column_name
列只保存3条数据
使用举例:
数据
:) select * from t1;
SELECT *
FROM t1
Query id: acdb8771-7471-4897-849c-b613b0edd194
┌─id─┬─name───┬───birthday─┐
│ 1 │ First │ 2011-01-01 │
│ 2 │ Second │ 2012-02-02 │
│ 3 │ Second │ 2011-01-01 │
└────┴────────┴────────────┘
3 rows in set. Elapsed: 0.004 sec.
2
3
4
5
6
7
8
9
10
11
12
13
14
查询,每个生日的第一个人,每个birthday,只保留1个。
:) select * from t1 limit 1 by birthday;
SELECT *
FROM t1
LIMIT 1 BY birthday
Query id: 479cc5ea-abee-4ff6-a5b6-ffb208efc730
┌─id─┬─name───┬───birthday─┐
│ 1 │ First │ 2011-01-01 │
│ 2 │ Second │ 2012-02-02 │
└────┴────────┴────────────┘
2 rows in set. Elapsed: 0.002 sec.
2
3
4
5
6
7
8
9
10
11
12
13
14
# 删除重复的 USING Key
例如下面的语句,重复的关联键 id 字段会被去重:
EXPLAIN SYNTAX
SELECT
a.UserID,
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM hits_v1 AS a
LEFT JOIN visits_v1 AS b USING (UserID, UserID)
# 返回优化后的语句:
SELECT
UserID,
UserID,
VisitID,
URL,
b.UserID
FROM hits_v1 AS a
ALL LEFT JOIN visits_v1 AS b USING (UserID)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
USING 用法
USING(UserID)
与 a.UserID = b.UserID
相同
# TODO-标量替换
如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的total_disk_usage
字段:
EXPLAIN SYNTAX
WITH
(
SELECT sum(bytes)
FROM system.parts
WHERE active
) AS total_disk_usage
SELECT
(sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table
FROM system.parts
GROUP BY table
ORDER BY table_disk_usage DESC
LIMIT 10;
#返回优化后的语句:
WITH CAST(0, \'UInt64\') AS total_disk_usage
SELECT
(sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table
FROM system.parts
GROUP BY table
ORDER BY table_disk_usage DESC
LIMIT 10
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24