我们的文章会在微信公众号IT民工的龙马人生和博客网站( www.htz.pw )同步更新 ,欢迎关注收藏,也欢迎大家转载,但是请在文章开始地方标注文章出处,谢谢!
由于博客中有大量代码,通过页面浏览效果更佳。
在上一篇文章提到朋友他们小系统从Oracle迁移到PG中性能下降1000倍的案例中提到在PG环境中not in无法进行重写,改为反连接的方式,需要手动将SQL语句改为not exists,其中有网友分享了在PG17版本中引入新特性《允许将相关的 IN 子查询转换为连接》,功能是真的好,弥补了在之前老版本中只能转换简单的IN子查询的缺陷,此功能可以大大减少对我们开发人员对SQL能力要求和提升Oracle迁移到PG的兼容性,下面就简单在17版本中测试,看看这个功能如何。
测试环境还是利用的postgres_air这个测试环境。
1 NOT IN 是否转换
继续使用上篇中提到的SQL语句。
htz=# explain select count(*) from booking a where a.account_id not in (select b.account_id from account b);
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=43138248424.04..43138248424.05 rows=1 width=8)
-> Gather (cost=43138248423.82..43138248424.03 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=43138247423.82..43138247423.83 rows=1 width=8)
-> Parallel Seq Scan on booking a (cost=0.00..43138244322.60 rows=1240490 width=0)
Filter: (NOT (ANY (account_id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.00..31970.50 rows=1121833 width=4)
-> Seq Scan on account b (cost=0.00..21978.33 rows=1121833 width=4)
(9 rows)
在执行计划中注意到关键词SubPlan 1,所以这里说明在NOT IN这样的场景中还是无法做到自动转换,期待PG在未来的版本中新增NOT IN的转换功能。
2 IN的转换
通过查阅官方给出的文档(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639)可以得到在PG 17之前的版本有如下的限制:
- 子链接只能被拉取为子查询
- 无法处理
varlevelsup=1
的变量引用 - 缺乏lateral join的支持能力
那在17的版本中,怎么知道数据库进行了重新呢?除了执行计划外,还可以通过参数debug_print_rewritten,此参数是一个用于调试的参数,其作用是 打印查询在经过重写规则(rewrite rules)处理之后的内部表示。
2.1 拉取ANY_subquery的语句
测试SQL语句:
EXPLAIN (COSTS OFF, ANALYZE)
SELECT o.order_id, o.customer_id, o.total_amount
FROM test_orders o
WHERE o.order_id IN (
SELECT oi.order_id
FROM test_order_items oi
WHERE oi.product_id IN (
SELECT p.product_id
FROM test_products p
WHERE p.category = '手机' AND p.price > o.total_amount
)
);
在PG17中执行计划:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop Semi Join (actual time=0.158..0.159 rows=0 loops=1)
-> Seq Scan on test_orders o (actual time=0.012..0.013 rows=10 loops=1)
-> Subquery Scan on "ANY_subquery" (actual time=0.014..0.014 rows=0 loops=10)
Filter: ((o.order_id)::text = ("ANY_subquery".order_id)::text)
Rows Removed by Filter: 1
-> Nested Loop (actual time=0.011..0.014 rows=1 loops=10)
-> Index Scan using idx_products_category on test_products p (actual time=0.005..0.006 rows=1 loops=10)
Index Cond: ((category)::text = '手机'::text)
Filter: (price > o.total_amount)
Rows Removed by Filter: 2
-> Bitmap Heap Scan on test_order_items oi (actual time=0.005..0.005 rows=1 loops=11)
Recheck Cond: ((product_id)::text = (p.product_id)::text)
Heap Blocks: exact=9
-> Bitmap Index Scan on idx_order_items_product_id (actual time=0.003..0.003 rows=1 loops=11)
Index Cond: ((product_id)::text = (p.product_id)::text)
Planning Time: 1.820 ms
Execution Time: 0.242 ms
(17 rows)
在PG17的执行计划中,已经看不到SubPlan的关键词,也就说明子查询重写了。
在PG 14中执行计划:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on test_orders o (actual time=0.076..0.076 rows=0 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 10
SubPlan 1
-> Nested Loop (actual time=0.005..0.007 rows=1 loops=10)
-> Index Scan using idx_products_category on test_products p (actual time=0.003..0.003 rows=1 loops=10)
Index Cond: ((category)::text = '手机'::text)
Filter: (price > o.total_amount)
Rows Removed by Filter: 2
-> Bitmap Heap Scan on test_order_items oi (actual time=0.002..0.002 rows=1 loops=11)
Recheck Cond: ((product_id)::text = (p.product_id)::text)
Heap Blocks: exact=9
-> Bitmap Index Scan on idx_order_items_product_id (actual time=0.002..0.002 rows=1 loops=11)
Index Cond: ((product_id)::text = (p.product_id)::text)
Planning Time: 0.471 ms
Execution Time: 0.144 ms
(16 rows)
这里可以看到SubPlan的关键词。
2.2 多层嵌套
EXPLAIN (COSTS OFF, ANALYZE)
SELECT c.customer_name, p.product_name, p.price
FROM test_customers c
INNER JOIN test_products p ON p.product_id IN (
SELECT oi.product_id
FROM test_order_items oi
WHERE oi.order_id IN (
SELECT o.order_id
FROM test_orders o
WHERE o.customer_id = c.customer_id
AND o.order_status = 'completed'
)
)
WHERE c.customer_type = 'P';
在PG 17中的执行计划:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (actual time=0.127..0.139 rows=3 loops=1)
-> Bitmap Heap Scan on test_customers c (actual time=0.046..0.046 rows=3 loops=1)
Recheck Cond: (customer_type = 'P'::bpchar)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_customers_type (actual time=0.027..0.027 rows=3 loops=1)
Index Cond: (customer_type = 'P'::bpchar)
-> Nested Loop (actual time=0.029..0.030 rows=1 loops=3)
-> HashAggregate (actual time=0.021..0.021 rows=1 loops=3)
Group Key: ("ANY_subquery".product_id)::text
Batches: 1 Memory Usage: 24kB
-> Subquery Scan on "ANY_subquery" (actual time=0.017..0.019 rows=1 loops=3)
-> Nested Loop (actual time=0.016..0.018 rows=1 loops=3)
-> Bitmap Heap Scan on test_orders o (actual time=0.010..0.010 rows=1 loops=3)
Recheck Cond: ((customer_id)::text = (c.customer_id)::text)
Filter: ((order_status)::text = 'completed'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on idx_orders_customer_date (actual time=0.006..0.006 rows=1 loops=3)
Index Cond: ((customer_id)::text = (c.customer_id)::text)
-> Bitmap Heap Scan on test_order_items oi (actual time=0.006..0.006 rows=1 loops=3)
Recheck Cond: ((order_id)::text = (o.order_id)::text)
Heap Blocks: exact=3
-> Bitmap Index Scan on idx_order_items_order_product (actual time=0.003..0.003 rows=1 loops=3)
Index Cond: ((order_id)::text = (o.order_id)::text)
-> Index Scan using test_products_pkey on test_products p (actual time=0.008..0.008 rows=1 loops=3)
Index Cond: ((product_id)::text = ("ANY_subquery".product_id)::text)
Planning Time: 0.594 ms
Execution Time: 0.293 ms
(27 rows)
在PG 14中的执行计划。
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Nested Loop (actual time=0.104..0.208 rows=3 loops=1)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 27
-> Seq Scan on test_products p (actual time=0.020..0.022 rows=10 loops=1)
-> Materialize (actual time=0.003..0.003 rows=3 loops=10)
-> Bitmap Heap Scan on test_customers c (actual time=0.021..0.022 rows=3 loops=1)
Recheck Cond: (customer_type = 'P'::bpchar)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_customers_type (actual time=0.012..0.012 rows=3 loops=1)
Index Cond: (customer_type = 'P'::bpchar)
SubPlan 1
-> Nested Loop (actual time=0.004..0.005 rows=1 loops=30)
-> Index Scan using idx_orders_customer_date on test_orders o (actual time=0.001..0.002 rows=1 loops=30)
Index Cond: ((customer_id)::text = (c.customer_id)::text)
Filter: ((order_status)::text = 'completed'::text)
-> Bitmap Heap Scan on test_order_items oi (actual time=0.002..0.002 rows=1 loops=29)
Recheck Cond: ((order_id)::text = (o.order_id)::text)
Heap Blocks: exact=29
-> Bitmap Index Scan on idx_order_items_order_product (actual time=0.001..0.001 rows=1 loops=29)
Index Cond: ((order_id)::text = (o.order_id)::text)
Planning Time: 1.657 ms
Execution Time: 0.307 ms
(22 rows)
3 总结
通过上面的列子,可以知道PG17新版本中引入的IN转换的功能可以很好的对IN子查询的进行自动转换,弥补了之前版本中只能转换简单的IN子查询,期待PG未来的版本在优化器上能够更加的完善。
——————作者介绍———————–
姓名:黄廷忠
现就职:Oracle中国高级服务团队
曾就职:OceanBase、云和恩墨、东方龙马等
电话、微信、QQ:18081072613
个人博客: (http://www.htz.pw)
CSDN地址: (https://blog.csdn.net/wwwhtzpw)
博客园地址: (https://www.cnblogs.com/www-htz-pw)
PG优化系列:Oracle迁移到PG中性能下降1000倍续集:等您坐沙发呢!