PostgreSQL Explain

03 Feb, 2013

1 notice

2 explain basics

3 explain analyse

4 Caveats 注意事项

1 notice

简单翻译了一下 http://www.postgresql.org/docs/9.2/static/using-explain.html ，里面有些内容没翻译。

2 explain basics

EXPLAIN SELECT * FROM tenk1;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)

第一行输出需要的消耗。

最后一行输出需要的消耗。

返回的行数，假定全部返回。并非扫描处理的行数。

每行的平均数据量，字节

消耗是使用设置的规划器的一些 cost常量¹ 来计算出来的，上一级的 node 的消耗包括了所有子 node 的消耗，消耗不包括传输结果花的时间。

SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';

从上面这个语句可以看到，tenk1 有 358 磁盘页和 10000 行。这个查询的消耗计算公式如下

 (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost)

默认情况下 seq_page_cost = 1.0, cpu_tuple_cost = 0.01, 所以最总消耗是 (358 * 1.0) + (10000 * 0.01) = 458 。

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;

                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=7001 width=244)
   Filter: (unique1 < 7000)

再看这个，加了一个 where 条件，多了一个 filter 操作，返回结果变少，但是消耗变多了。因为总体需要处理的数据没有变少，并且还多了

 10000 * cpu_operator_cost

需要的消耗。

这些计算方法得到的数值可能和你系统的会稍有区别，使用 analyse 命令可能会改变计算结果，因为它会根据一些随机的样本重新收集表的状态信息。

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100;

                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=5.03..229.17 rows=101 width=244)
   Recheck Cond: (unique1 < 100)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0)
         Index Cond: (unique1 < 100)

再来看一个。子执行计划节点会先扫描索引，上一级的执行计划节点会根据索引返回的位置来获取数据。获取散列数据比获取全部数据消耗要大，不过因为毕竟是少量数据，可以看到消耗还是小于全表扫描。上级的节点会在获取散列数据之前做一个磁盘排序，好减少获取的消耗，而 Bitmap 字样表示索引返回的结果是按照存储位置排好序的，这就是使用索引的一个原因。

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND stringu1 = 'xxx';

                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=5.01..229.40 rows=1 width=244)
   Recheck Cond: (unique1 < 100)
   Filter: (stringu1 = 'xxx'::name)
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0)
         Index Cond: (unique1 < 100)

stringu1 = 'xxx' 减少了输出的行数，因为还需要扫描同样的行数，所以没有减少执行计划的消耗，而且还有稍微的增加，

EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42;

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using tenk1_unique1 on tenk1  (cost=0.00..8.27 rows=1 width=244)
   Index Cond: (unique1 = 42)

数据根据索引排序来获取，不需要额外的排序。

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=25.01..60.14 rows=10 width=244)
   Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))
   ->  BitmapAnd  (cost=25.01..25.01 rows=10 width=0)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0)
               Index Cond: (unique1 < 100)
         ->  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.74 rows=999 width=0)
               Index Cond: (unique2 > 9000)

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=0.00..14.25 rows=2 width=244)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..71.23 rows=10 width=244)
         Index Cond: (unique2 > 9000)
         Filter: (unique1 < 100)

这两个查询，第一个分别查两个索引，然后做 BitmapAnd 得到结果。第二个只查了其中一个索引，然后使用 filter 方式来做过滤，可以看到第二个查询里面的消耗 71.23 表示的是返回所有数据需要的消耗，Limit 节点在得到需要的结果厚，会终止执行，所以最终没有返回那么多，最终消耗小一点，比第一个查询小。

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=4.33..118.25 rows=10 width=488)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.33..39.44 rows=10 width=244)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.33 rows=10 width=0)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..7.87 rows=1 width=244)
         Index Cond: (unique2 = t1.unique2)

这个是两表 join，最后通过嵌套循环节点来做的 join。先查询外层 Bitmap heap scan 这个节点，这个节点还有个子节点。这个节点的查询和 select ... where unique1 < 10 的查询类似。然后执行内层 Index scan using 这个节点，外层查询获取到的数据会插入进来，这个时候 t1.unique2 的值已经有了，所以这层的查询和 select ... where t2.unique2 = 常量 的查询类似，消耗比前面的小是因为第一次查询有了 cache，这次就会稍微快了一点。这层查询的消耗是 外层行数*内层的消耗=10 * 7.87 加上一些 cpu 消耗。

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2 AND t1.hundred < t2.hundred;

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=4.33..118.28 rows=3 width=488)
   Join Filter: (t1.hundred < t2.hundred)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.33..39.44 rows=10 width=244)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.33 rows=10 width=0)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..7.87 rows=1 width=244)
         Index Cond: (unique2 = t1.unique2)

如果有条件无法在两个子节点上面执行，那就只能在 join 节点来做 filter，类似上面这个。实际执行时间和前面那个没有 t1.hundred < t2.hundred 条件的差不多。

outer join 的时候，还可能会有 Join Filter 和 Filter 同时出现的情况，join filter 出现在外表的 on 语句里面，不匹配的数据依然会返回 null。而 filter 这个时候是用来过滤 join 结果的。

对于 inner join，Join Filter 和 Filter 两个没有区别。

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Hash Join  (cost=230.43..713.94 rows=101 width=488)
   Hash Cond: (t2.unique2 = t1.unique2)
   ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244)
   ->  Hash  (cost=229.17..229.17 rows=101 width=244)
         ->  Bitmap Heap Scan on tenk1 t1  (cost=5.03..229.17 rows=101 width=244)
               Recheck Cond: (unique1 < 100)
               ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0)
                     Index Cond: (unique1 < 100)

改变一些条件，会看到执行计划变了，选择了 hash join。一个表的结果会在内层里面记录一个 hash 表，查另一个表的时候会查这个 hash 表来查看匹配的数据。

EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Merge Join  (cost=197.83..267.93 rows=10 width=488)
   Merge Cond: (t1.unique2 = t2.unique2)
   ->  Index Scan using tenk1_unique2 on tenk1 t1  (cost=0.00..656.25 rows=101 width=244)
         Filter: (unique1 < 100)
   ->  Sort  (cost=197.83..200.33 rows=1000 width=244)
         Sort Key: t2.unique2
         ->  Seq Scan on onek t2  (cost=0.00..148.00 rows=1000 width=244)

这是一个 merge jion 操作。merge join 需要输入的表都按照 join 字段先进行排序。从 t1 直接通过索引进行了查询，得到的结果就是顺序的。因为 t2 需要返回的数据量大，从 t2 做了全表扫描，然后进行了排序。通常全表扫描+排序比走索引消耗小。

出同样的结果有很多路可以走，如果想看看其他执行计划是否更实惠，可以通过打开关闭一些标志²来做到。

SET enable_sort = off;

EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..292.36 rows=10 width=488)
   Merge Cond: (t1.unique2 = t2.unique2)
   ->  Index Scan using tenk1_unique2 on tenk1 t1  (cost=0.00..656.25 rows=101 width=244)
         Filter: (unique1 < 100)
   ->  Index Scan using onek_unique2 on onek t2  (cost=0.00..224.76 rows=1000 width=244)

可以看到走索引比全表扫描消耗更大。

3 explain analyse

explain 的 analyse 选项可以看看规划器执行 sql 所用的真实时间。

EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.33..118.25 rows=10 width=488) (actual time=0.370..1.126 rows=10 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.33..39.44 rows=10 width=244) (actual time=0.254..0.380 rows=10 loops=1)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.33 rows=10 width=0) (actual time=0.164..0.164 rows=10 loops=1)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..7.87 rows=1 width=244) (actual time=0.041..0.048 rows=1 loops=10)
         Index Cond: (unique2 = t1.unique2)
 Total runtime: 2.414 ms

如上所示，会显示出来真实的执行时间（毫秒），返回的行数等其他信息。为了和 explain 里面消耗的取值能匹配上，loop 大于 1 的节点显示的是单次执行的时间，需要乘以循环次数。

EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=717.30..717.56 rows=101 width=488) (actual time=104.950..105.327 rows=100 loops=1)
   Sort Key: t1.fivethous
   Sort Method: quicksort  Memory: 68kB
   ->  Hash Join  (cost=230.43..713.94 rows=101 width=488) (actual time=3.680..102.396 rows=100 loops=1)
         Hash Cond: (t2.unique2 = t1.unique2)
         ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244) (actual time=0.046..46.219 rows=10000 loops=1)
         ->  Hash  (cost=229.17..229.17 rows=101 width=244) (actual time=3.184..3.184 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 27kB
               ->  Bitmap Heap Scan on tenk1 t1  (cost=5.03..229.17 rows=101 width=244) (actual time=0.612..1.959 rows=100 loops=1)
                     Recheck Cond: (unique1 < 100)
                     ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0) (actual time=0.390..0.390 rows=100 loops=1)
                           Index Cond: (unique1 < 100)
 Total runtime: 107.392 ms

有时候还会显示更多的一些信息，比如 sort 和 hash 节点。sort 节点会显示排序的方法，内存磁盘的占用。hash 节点会显示桶（buckets）和批次（batches），如果 batches 大于 1 会使用磁盘空间。

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE ten < 7;

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..483.00 rows=7000 width=244) (actual time=0.111..59.249 rows=7000 loops=1)
   Filter: (ten < 7)
   Rows Removed by Filter: 3000
 Total runtime: 85.340 ms

上面这个执行计划显示了 filter 移除了多少行结果。

explain 还提供了一个 buffers 选项。能得到 query 的哪部分 io 最密集。

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=25.07..60.23 rows=10 width=244) (actual time=3.069..3.213 rows=10 loops=1)
   Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))
   Buffers: shared hit=16
   ->  BitmapAnd  (cost=25.07..25.07 rows=10 width=0) (actual time=2.967..2.967 rows=0 loops=1)
         Buffers: shared hit=7
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.02 rows=102 width=0) (actual time=0.732..0.732 rows=200 loops=1)
               Index Cond: (unique1 < 100)
               Buffers: shared hit=2
         ->  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.80 rows=1007 width=0) (actual time=2.015..2.015 rows=1009 loops=1)
               Index Cond: (unique2 > 9000)
               Buffers: shared hit=5
 Total runtime: 3.917 ms

要注意的是，explain analyse 会真的执行 sql，所以对于 update，insert，delete 等的 sql 操作，最好在执行的时候启用事务，执行完 explain 之后 rollback。

4 Caveats 注意事项

有两个情况可能会影响 explain analyse 和实际执行的差别
1 因为没有任何输出，所以 I/O 和网络网络传输消耗没有体现。
2 因为要输出执行时间，而有些系统里面 gettimeofday() 这个系统调用很慢，所以会导致消耗增加。

在衡量 explain 执行结果的时候，最好是在情况类似的 db 上面，否则可能没有任何参考价值。比如在一个小表上面的执行计划可能和大表就会很不一样。

Footnotes:

¹ : http://www.postgresql.org/docs/9.2/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

² : http://www.postgresql.org/docs/9.2/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

Date: 2013-02-03 18:27:03 CST

Author: Wang Dong

Org version 7.8.11 with Emacs version 24

Validate XHTML 1.0

wd and cc