07 October 2012

在PostgreSQL 9.2.0的Release Note中的一个特性:

Recognize self-contradictory restriction clauses for non-table relations (Tom Lane)

This check is only performed when constraint_exclusion is on.


对这个特性,从上面的说明中看,不是很理解,然后找了一下代码库的commit log

The constraint exclusion feature checks for contradictions among scan
restriction clauses, as well as contradictions between those clauses and a
table's CHECK constraints.  The first aspect of this testing can be useful
for non-table relations (such as subqueries or functions-in-FROM), but the
feature was coded with only the CHECK case in mind so we were applying it
only to plain-table RTEs.  Move the relation_excluded_by_constraints call
so that it is applied to all RTEs not just plain tables.  With the default
setting of constraint_exclusion this results in no extra work, but with
constraint_exclusion = ON we will detect optimizations that we missed
before (at the cost of more planner cycles than we expended before).

Per a gripe from Gunnlaugur Þór Briem.  Experimentation with
his example also showed we were not being very bright about the case where
constraint exclusion is proven within a subquery within UNION ALL, so tweak
the code to allow set_append_rel_pathlist to recognize such cases.

大意是说:在查询优化的时候,增加了对table CHECK、where条件中的约束以及subquery中的约束的检查,用以减少对表的扫描。这个功能只在constraint_exclusion = ON的时候开启(默认情况下,constraint_exclusion =  PARTITION,所以默认不生效),开启后,优点是对此类SQL进行优化,缺点是增加了所有SQL的plan的生成时间,所以应该根据应用情况,选择性的开启。

以下是我的一些测试:

开启开关后,subquery中的过滤条件与外层的过滤条件的交集为空,优化结果是SQL不进行计算,直接返回:

test=# set constraint_exclusion to on;
SET
test=# explain select * from (select i from generate_series(1,1000000) g(i) where i = -10 ) t where t.i > 10;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(2 rows)

恢复开关为默认情况,则会进行计算和扫描:

test=# set constraint_exclusion to DEFAULT ;
SET
test=# explain select * from (select i from generate_series(1,1000000) g(i) where i = -10 ) t where t.i > 10;
QUERY PLAN
-----------------------------------------------------------------------
Function Scan on generate_series g (cost=0.00..15.00 rows=2 width=4)
Filter: ((i > 10) AND (i = (-10)))
(2 rows)

对于同级的过滤条件,如果交集为空,在开关开启的情况下,也会进行优化:

test=# set constraint_exclusion to on;
test=# explain (select * from (select i from generate_series(1,1000000) g(i) ) t where t.i > 10 and t.i < -1) ;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(2 rows)

test=# set constraint_exclusion to DEFAULT ;
SET
test=# explain (select * from (select i from generate_series(1,1000000) g(i) ) t where t.i > 10 and t.i < -1) ;
QUERY PLAN
-----------------------------------------------------------------------
Function Scan on generate_series g (cost=0.00..15.00 rows=5 width=4)
Filter: ((i > 10) AND (i < (-1)))
(2 rows)

如果过滤条件中有简单的计算表达式,也同样会优化:

test=# set constraint_exclusion to DEFAULT ;
test=# explain (select * from (select i from generate_series(1,1000000) g(i) ) t where t.i > 10+1 and t.i < -1) ;
QUERY PLAN
-----------------------------------------------------------------------
Function Scan on generate_series g (cost=0.00..15.00 rows=5 width=4)
Filter: ((i > 11) AND (i < (-1)))
(2 rows)

test=# set constraint_exclusion to on;
SET
test=# explain (select * from (select i from generate_series(1,1000000) g(i) ) t where t.i > 10+1 and t.i < -1) ;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(2 rows)

但是对于union all,则测试中发现只进行了部分优化:

test=# set constraint_exclusion to on;
test=# explain (select * from (select i from generate_series(1,1000000) g(i) where i = -10 ) t where t.i > 10) union all (select * from (select i from generate_series(1,1000000) x(i) where i = -10 ) t where t.i < 10);
QUERY PLAN
-----------------------------------------------------------------------------------
Result (cost=0.00..15.02 rows=2 width=4)
Append (cost=0.00..15.02 rows=2 width=4)
Function Scan on generate_series x (cost=0.00..15.00 rows=2 width=4)
Filter: ((i < 10) AND (i = (-10)))
(4 rows)

test=# set constraint_exclusion to DEFAULT ;
SET
test=# explain (select * from (select i from generate_series(1,1000000) g(i) where i = -10 ) t where t.i > 10) union all (select * from (select i from generate_series(1,1000000) x(i) where i = -10 ) t where t.i < 10);
QUERY PLAN
-----------------------------------------------------------------------------------
Result (cost=0.00..30.04 rows=4 width=4)
Append (cost=0.00..30.04 rows=4 width=4)
Function Scan on generate_series g (cost=0.00..15.00 rows=2 width=4)
Filter: ((i > 10) AND (i = (-10)))
Function Scan on generate_series x (cost=0.00..15.00 rows=2 width=4)
Filter: ((i > 10) AND (i = (-10)))
(6 rows)

对于表的check约束与where条件冲突的情况的优化:

test=# create table test(a int check(a < 100));
CREATE TABLE
test=# insert INTO test select i from generate_series(1,90) g(i);
INSERT 0 90
test=# set constraint_exclusion to DEFAULT ;
SET
test=# EXPLAIN select * from test where a > 150;
QUERY PLAN
----------------------------------------------------
Seq Scan on test (cost=0.00..2.12 rows=1 width=4)
Filter: (a > 150)
(2 rows)
test=# set constraint_exclusion to on;
SET
test=# EXPLAIN select * from test where a > 150;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
(2 rows)

总结:
1.这个特性对于sql中经常有冲突的条件导致表的扫描结果为空有重大优化,这在OLAP中复杂SQL中出现概率比较高,而且OLAP计算的时间一般远高于查询的小号,因此,建议在OLAP或者数据仓库应用中开启这个选项。
2.对于有union的优化不充分,除此之外,怀疑还有其他情况没有考虑到。
3.对于check条件有优化。