当前位置: 首页 > ORACLE > 正文

我们的文章会在微信公众号IT民工的龙马人生博客网站( www.htz.pw )同步更新 ,欢迎关注收藏,也欢迎大家转载,但是请在文章开始地方标注文章出处,谢谢!
由于博客中有大量代码,通过页面浏览效果更佳。

在上一篇文章提到朋友他们小系统从Oracle迁移到PG中性能下降1000倍的案例中提到在PG环境中not in无法进行重写,改为反连接的方式,需要手动将SQL语句改为not exists,其中有网友分享了在PG17版本中引入新特性《允许将相关的 IN 子查询转换为连接》,功能是真的好,弥补了在之前老版本中只能转换简单的IN子查询的缺陷,此功能可以大大减少对我们开发人员对SQL能力要求和提升Oracle迁移到PG的兼容性,下面就简单在17版本中测试,看看这个功能如何。

测试环境还是利用的postgres_air这个测试环境。

1 NOT IN 是否转换

继续使用上篇中提到的SQL语句。

htz=# explain select count(*) from booking a where a.account_id not in (select b.account_id from account b);
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=43138248424.04..43138248424.05 rows=1 width=8)
   ->  Gather  (cost=43138248423.82..43138248424.03 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=43138247423.82..43138247423.83 rows=1 width=8)
               ->  Parallel Seq Scan on booking a  (cost=0.00..43138244322.60 rows=1240490 width=0)
                     Filter: (NOT (ANY (account_id = (SubPlan 1).col1)))
                     SubPlan 1
                       ->  Materialize  (cost=0.00..31970.50 rows=1121833 width=4)
                             ->  Seq Scan on account b  (cost=0.00..21978.33 rows=1121833 width=4)
(9 rows)

在执行计划中注意到关键词SubPlan 1,所以这里说明在NOT IN这样的场景中还是无法做到自动转换,期待PG在未来的版本中新增NOT IN的转换功能。

2 IN的转换

通过查阅官方给出的文档(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f1337639)可以得到在PG 17之前的版本有如下的限制:

  • 子链接只能被拉取为子查询
  • 无法处理varlevelsup=1的变量引用
  • 缺乏lateral join的支持能力

那在17的版本中,怎么知道数据库进行了重新呢?除了执行计划外,还可以通过参数debug_print_rewritten,此参数是一个用于调试的参数,其作用是 打印查询在经过重写规则(rewrite rules)处理之后的内部表示。

2.1 拉取ANY_subquery的语句

测试SQL语句:
EXPLAIN (COSTS OFF, ANALYZE)
SELECT o.order_id, o.customer_id, o.total_amount
FROM test_orders o
WHERE o.order_id IN (
    SELECT oi.order_id 
    FROM test_order_items oi 
    WHERE oi.product_id IN (
        SELECT p.product_id 
        FROM test_products p 
        WHERE p.category = '手机' AND p.price > o.total_amount
    )
);

在PG17中执行计划:

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join (actual time=0.158..0.159 rows=0 loops=1)
   ->  Seq Scan on test_orders o (actual time=0.012..0.013 rows=10 loops=1)
   ->  Subquery Scan on "ANY_subquery" (actual time=0.014..0.014 rows=0 loops=10)
         Filter: ((o.order_id)::text = ("ANY_subquery".order_id)::text)
         Rows Removed by Filter: 1
         ->  Nested Loop (actual time=0.011..0.014 rows=1 loops=10)
               ->  Index Scan using idx_products_category on test_products p (actual time=0.005..0.006 rows=1 loops=10)
                     Index Cond: ((category)::text = '手机'::text)
                     Filter: (price > o.total_amount)
                     Rows Removed by Filter: 2
               ->  Bitmap Heap Scan on test_order_items oi (actual time=0.005..0.005 rows=1 loops=11)
                     Recheck Cond: ((product_id)::text = (p.product_id)::text)
                     Heap Blocks: exact=9
                     ->  Bitmap Index Scan on idx_order_items_product_id (actual time=0.003..0.003 rows=1 loops=11)
                           Index Cond: ((product_id)::text = (p.product_id)::text)
 Planning Time: 1.820 ms
 Execution Time: 0.242 ms
(17 rows)

在PG17的执行计划中,已经看不到SubPlan的关键词,也就说明子查询重写了。

在PG 14中执行计划:

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on test_orders o (actual time=0.076..0.076 rows=0 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 10
   SubPlan 1
     ->  Nested Loop (actual time=0.005..0.007 rows=1 loops=10)
           ->  Index Scan using idx_products_category on test_products p (actual time=0.003..0.003 rows=1 loops=10)
                 Index Cond: ((category)::text = '手机'::text)
                 Filter: (price > o.total_amount)
                 Rows Removed by Filter: 2
           ->  Bitmap Heap Scan on test_order_items oi (actual time=0.002..0.002 rows=1 loops=11)
                 Recheck Cond: ((product_id)::text = (p.product_id)::text)
                 Heap Blocks: exact=9
                 ->  Bitmap Index Scan on idx_order_items_product_id (actual time=0.002..0.002 rows=1 loops=11)
                       Index Cond: ((product_id)::text = (p.product_id)::text)
 Planning Time: 0.471 ms
 Execution Time: 0.144 ms
(16 rows)

这里可以看到SubPlan的关键词。

2.2 多层嵌套

EXPLAIN (COSTS OFF, ANALYZE)
SELECT c.customer_name, p.product_name, p.price
FROM test_customers c
INNER JOIN test_products p ON p.product_id IN (
    SELECT oi.product_id 
    FROM test_order_items oi 
    WHERE oi.order_id IN (
        SELECT o.order_id 
        FROM test_orders o 
        WHERE o.customer_id = c.customer_id
          AND o.order_status = 'completed'
    )
)
WHERE c.customer_type = 'P';

在PG 17中的执行计划:

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=0.127..0.139 rows=3 loops=1)
   ->  Bitmap Heap Scan on test_customers c (actual time=0.046..0.046 rows=3 loops=1)
         Recheck Cond: (customer_type = 'P'::bpchar)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_customers_type (actual time=0.027..0.027 rows=3 loops=1)
               Index Cond: (customer_type = 'P'::bpchar)
   ->  Nested Loop (actual time=0.029..0.030 rows=1 loops=3)
         ->  HashAggregate (actual time=0.021..0.021 rows=1 loops=3)
               Group Key: ("ANY_subquery".product_id)::text
               Batches: 1  Memory Usage: 24kB
               ->  Subquery Scan on "ANY_subquery" (actual time=0.017..0.019 rows=1 loops=3)
                     ->  Nested Loop (actual time=0.016..0.018 rows=1 loops=3)
                           ->  Bitmap Heap Scan on test_orders o (actual time=0.010..0.010 rows=1 loops=3)
                                 Recheck Cond: ((customer_id)::text = (c.customer_id)::text)
                                 Filter: ((order_status)::text = 'completed'::text)
                                 Heap Blocks: exact=2
                                 ->  Bitmap Index Scan on idx_orders_customer_date (actual time=0.006..0.006 rows=1 loops=3)
                                       Index Cond: ((customer_id)::text = (c.customer_id)::text)
                           ->  Bitmap Heap Scan on test_order_items oi (actual time=0.006..0.006 rows=1 loops=3)
                                 Recheck Cond: ((order_id)::text = (o.order_id)::text)
                                 Heap Blocks: exact=3
                                 ->  Bitmap Index Scan on idx_order_items_order_product (actual time=0.003..0.003 rows=1 loops=3)
                                       Index Cond: ((order_id)::text = (o.order_id)::text)
         ->  Index Scan using test_products_pkey on test_products p (actual time=0.008..0.008 rows=1 loops=3)
               Index Cond: ((product_id)::text = ("ANY_subquery".product_id)::text)
 Planning Time: 0.594 ms
 Execution Time: 0.293 ms
(27 rows)

在PG 14中的执行计划。

                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=0.104..0.208 rows=3 loops=1)
   Join Filter: (SubPlan 1)
   Rows Removed by Join Filter: 27
   ->  Seq Scan on test_products p (actual time=0.020..0.022 rows=10 loops=1)
   ->  Materialize (actual time=0.003..0.003 rows=3 loops=10)
         ->  Bitmap Heap Scan on test_customers c (actual time=0.021..0.022 rows=3 loops=1)
               Recheck Cond: (customer_type = 'P'::bpchar)
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on idx_customers_type (actual time=0.012..0.012 rows=3 loops=1)
                     Index Cond: (customer_type = 'P'::bpchar)
   SubPlan 1
     ->  Nested Loop (actual time=0.004..0.005 rows=1 loops=30)
           ->  Index Scan using idx_orders_customer_date on test_orders o (actual time=0.001..0.002 rows=1 loops=30)
                 Index Cond: ((customer_id)::text = (c.customer_id)::text)
                 Filter: ((order_status)::text = 'completed'::text)
           ->  Bitmap Heap Scan on test_order_items oi (actual time=0.002..0.002 rows=1 loops=29)
                 Recheck Cond: ((order_id)::text = (o.order_id)::text)
                 Heap Blocks: exact=29
                 ->  Bitmap Index Scan on idx_order_items_order_product (actual time=0.001..0.001 rows=1 loops=29)
                       Index Cond: ((order_id)::text = (o.order_id)::text)
 Planning Time: 1.657 ms
 Execution Time: 0.307 ms
(22 rows)

3 总结

通过上面的列子,可以知道PG17新版本中引入的IN转换的功能可以很好的对IN子查询的进行自动转换,弥补了之前版本中只能转换简单的IN子查询,期待PG未来的版本在优化器上能够更加的完善。

——————作者介绍———————–
姓名:黄廷忠
现就职:Oracle中国高级服务团队
曾就职:OceanBase、云和恩墨、东方龙马等
电话、微信、QQ:18081072613
个人博客: (http://www.htz.pw)
CSDN地址: (https://blog.csdn.net/wwwhtzpw)
博客园地址: (https://www.cnblogs.com/www-htz-pw)


PG优化系列:Oracle迁移到PG中性能下降1000倍续集:等您坐沙发呢!

发表评论

gravatar

? razz sad evil ! smile oops grin eek shock ??? cool lol mad twisted roll wink idea arrow neutral cry mrgreen

快捷键:Ctrl+Enter