当前位置: 首页 > ORA, ORACLE > 正文

我们的文章会在微信公众号Oracle恢复实录和博客网站( www.htz.pw )同步更新 ,欢迎关注收藏,也欢迎大家转载,但是请在文章开始地方标注文章出处,谢谢!
由于博客中有大量代码,通过页面浏览效果更佳。

随着Oracle ADG(Active Data Guard)技术的不断成熟,越来越多的企业客户选择将只读类业务迁移到容灾环境(即ADG只读库)中运行。这样做不仅可以减轻主库的压力,还能更好地利用备库资源,提高整体系统的可用性和容错能力。然而,虽然ADG只读库和生产主库的数据是一致的,但由于两者在架构和实现机制上存在一些差异,导致很多客户在实际使用只读库时会遇到各种异常问题,影响了业务体验。

本文将结合实际案例,通俗易懂地介绍在使用ADG只读库时常见的一些“坑”,并分析背后的原因,帮助大家更好地理解和规避这些问题。

1,现象描述

本案例来至于于朋友分享,数据库版本为19c,故障现象为备库每天早上8点40左右,备库的LGWR都会被阻塞。从而其他应用因为请求不到instance lock后被LGWR阻塞,LGWR此时出现library cache lock等待事件。客户第二天重现的时候收集了systemstate dump,下面内容根据systemstate dump进行分析。

2,分析过程

LGWR PROCESS STATE OBJECT:

PROCESS 14: LGWR
  ----------------------------------------
  SO: 0x3d21de9858, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3d21de9858, name=process, file=ksu.h LINE:12721, pg=0
  (process) Oracle pid:14, ser:2, calls cur/top: 0x3ec22fda30/0x3ec22fda30
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x10) 
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 26
              last post received-location: ksa2.h LINE:285 ID:ksasnd
              last process to post me: 0x3da1e98788 1 0
              last post sent: 0 0 259
              last post sent-location: kgl.h LINE:8873 ID:kgllkdl: post after freeing latch
              last process posted by me: 0x3da1eb2980 171 0
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
    O/S info: user: oracle, term: UNKNOWN, ospid: 172824 
    OSD pid info: Unix process pid: 172824, image: oracle@FMS11DG (LGWR)
    Short stack dump: 
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2066<-kslwaitctx()+163<-ksfwaitctx()+14<-kglLockWait()+871<-kgllkal()+1166<-kglLock()+1118<-kglLockInstance()+259<-kksExclusiveParseLock()+47<-krdrsb_mdflush_local()+126<-krdrsb_adv_qscn()+1652<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
    ----------------------------------------
LGWR SESSION STATE OBJECT:

SO: 0x3d21f74f20, type: 4, owner: 0x3d21de9858, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3d21de9858, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 337 ser: 3 trans: (nil), creator: 0x3d21de9858
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID: 
              txn branch: (nil)
              edition#: 0              oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0x3e5ff630f8, lock address=0x3e7ff41450, 100*mode+namespace=0x1004a0003
        wait_id=9628259 seq_num=60150 snap_id=1
        wait times: snap=1.306065 sec, exc=1.306065 sec, total=1.306065 sec
        wait times: max=5.500000 sec, heur=1 min 8 sec
        wait counts: calls=2 os=2
        in_wait=1 iflags=0x15a2

2.1 确认阻塞者

LGWR SID为337,等待library cache lock,namespace一看就是instance lock,请求模式为X。

这里偷了个懒,很碰巧看到了final blocking session,就省的去搜索哪个session持有了instance lock了。

    SO: 0x3d42261708, type: 4, owner: 0x3da1eae6a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3da1eae6a0, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 2045 ser: 1 trans: (nil), creator: 0x3da1eae6a0
              flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID: 
              txn branch: (nil)
              edition#: 2605814              oct: 3, prv: 0, sql: 0x3e9fb65bd0, psql: 0x3e5e5b1860, user: 141/DBAAS_SYS
    ksuxds FALSE at location: 0
    service name: fmsdg
    client details:
      O/S info: user: zcloud, term: , ospid: 7395
      machine: VCOM01PXY program: zoramon_collector@VCOM01PXY (TNS V1-V3)
      application name: zoramon_collector@VCOM01PXY (TNS V1-V3), hash value=2183771371
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0x3e5ff630f8, lock address=0x3e1fbb8dd0, 100*mode+namespace=0x1004a0002
        wait_id=5523383 seq_num=18444 snap_id=1
        wait times: snap=1.336710 sec, exc=1.336710 sec, total=1.336710 sec
        wait times: max=15 min 0 sec, heur=1.336710 sec
        wait counts: calls=2 os=2
        in_wait=1 iflags=0x15a2
    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 337, ser: 3
      Dumping final blocker:
        inst: 1, sid: 1976, ser: 29501
final blocker为session 1976

final blocker session state object:

   SO: 0x3dc221a830, type: 4, owner: 0x3ec1e41a68, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3ec1e41a68, name=session, file=ksu.h LINE:12729, pg=0
    (session) sid: 1976 ser: 29501 trans: (nil), creator: 0x3ec1e41a68
              flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID: 
              txn branch: (nil)
              edition#: 2605814              oct: 3, prv: 0, sql: 0x3e5e197d50, psql: 0x3e9f487090, user: 142/FIX_DBQUERY
    ksuxds FALSE at location: 0
    service name: SYS$USERS
    client details:
      O/S info: user: afaim, term: unknown, ospid: 1234
      machine: VFIX02APP program: JDBC Thin Client
      application name: JDBC Thin Client, hash value=2546894660
    Current Wait Stack:
      Not in wait; last wait ended 1 min 16 sec ago 
    There are 2 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 337, ser: 3
      wait event: 'library cache lock'
        p1: 'handle address'=0x3e5ff630f8
        p2: 'lock address'=0x3e7ff41450
        p3: '100*mode+namespace'=0x1004a0003
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 0 secs, waiter_cache_ver: 24873

final blocker没有任何等待,正在执行的sql为:

  SO: 0x3df82faf90, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0
 
      LibraryObjectLock:  Address=0x3df82faf90 Handle=0x3e5e197d50 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0         
        
        User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=6552c1ec 
      LibraryHandle:  Address=0x3e5e197d50 Hash=3584acbe LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
        ObjectName:  Name=select
        s.VC_INTER_CODE,
        s.C_MARKET_NO,
        s.L_BUY_UNIT,
        s.VC_TRADE_CURRENCY_NO,
        s.EN_YESTERDAY_CLOSE_PRICE,
        s.VC_MIXED_TYPE,
        s.C_STOCK_TYPE,
        s.C_ASSET_CLASS,
        s.C_STOP_FLAG,
        s.VC_REPORT_CODE
        from trade.TSTOCKINFO s
         WHERE s.VC_INTER_CODE in (
                
                    :1 
                ,
                    :2 
                ,
                    :3 
                ,
                    :4 
           
          FullHashValue=dbbe27bcd13925c46366f2fc3584acbe Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=897887422 OwnerIdn=142 
        Statistics:  InvalidationCount=12 ExecutionCount=2804 LoadCount=14 ActiveLocks=1 TotalLockCount=58 TotalPinCount=1 
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=57 HandleInUse=57 HandleReferenceCount=0 
        Concurrency:  DependencyMutex=0x3e5e197e00(0, 135, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0) 
        Flags=RON/PIN/TIM/PN0/DBN/[10012841] 
        WaitersLists:  
          Lock=0x3e5e197de0[0x3e5e197de0,0x3e5e197de0] 
          Pin=0x3e5e197dc0[0x3e5e197dc0,0x3e5e197dc0] 
          LoadLock=0x3e5e197e38[0x3e5e197e38,0x3e5e197e38] 
        Timestamp:  Current=10-29-2023 08:40:12 
        HandleReference:  Address=0x3e5e198110 Handle=(nil) Flags=[00] 
        LibraryObject:  Address=0x3c7e8a0e00 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] 
          ChildTable:  size='16' 
            Child:  id='0' Table=0x3c7e8a1cb0 Reference=0x3c7e8a16f8 Handle=0x3e5e2ea8f0 
            Child:  id='1' Table=0x3c7e8a1cb0 Reference=0x3c7e8972c0 Handle=0x3e5de512f0 
        NamespaceDump:  
          Parent Cursor:  sql_id=66trkzhus9b5y parent=0x3c7e8a0ea0 maxchild=2 plk=y ppn=n

2.2 确认handle

有两个子游标handle,分别是:

Handle=0x3e5e2ea8f0
Handle=0x3e5de512f0

查看子游标lc state object:

  ```
  SO: 0x3df82fa9f8, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0

  LibraryObjectLock:  Address=0x3df82fa9f8 Handle=0x3e5e2ea8f0 Mode=N CanBeBrokenCount=23 Incarnation=23 ExecutionCount=0         
    ClusterLock=0x3dfe9ca0a8 Context=0x7f59809d8030 User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1         
    Flags=CBK/[0020] SavepointNum=0 
  LibraryHandle:  Address=0x3e5e2ea8f0 Hash=0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD 
    Name:  Namespace=SQL AREA(00) Type=CURSOR(00) 
    Statistics:  InvalidationCount=11 ExecutionCount=0 LoadCount=13 ActiveLocks=1 TotalLockCount=59 TotalPinCount=2652 
    Counters:  BrokenCount=23 RevocablePointer=23 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 
    Concurrency:  DependencyMutex=0x3e5e2ea9a0(0, 0, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0) 
    Flags=RON/PIN/PN0/EXP/CHD/[10012111] 
    WaitersLists:  
      Lock=0x3e5e2ea980[0x3e5e2ea980,0x3e5e2ea980] 
      Pin=0x3e5e2ea960[0x3e5e2ea960,0x3e5e2ea960] 
      LoadLock=0x3e5e2ea9d8[0x3e5e2ea9d8,0x3e5e2ea9d8] 
    LibraryObject:  Address=0x3c7f36d0b0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] 
      DataBlocks:  
        Block:  #='0' name=KGLH0^3584acbe pins=0 Change=NONE   
          Heap=0x3e5ce16cb0 Pointer=0x3c7f36d150 Extent=0x3c7f36d030 Flags=I/-/P/A/-/- 
          FreedLocation=0 Alloc=64.843750 Size=67.601562 LoadTime=6923111190 
        Block:  #='6' name=SQLA^3584acbe pins=0 Change=NONE   
          Heap=0x3c7e8a14c8 Pointer=0x3024187ea8 Extent=0x3024187268 Flags=I/-/P/A/-/E 
          FreedLocation=0 Alloc=69385.406250 Size=70401.898438 LoadTime=0 
    NamespaceDump:  
      Child Cursor:  Heap0=0x3c7f36d150 Heap6=0x3024187ea8 Heap0 Load Time=11-14-2023 08:40:13 Heap6 Load Time=11-14-2023 08:40:13       ----------------------------------------
  KGX Atomic Operation Log 0x3dfe9ca0a8
   Mutex 0x3c7e8a1420(1976, 0) idn 3584acbe oper LONG_EXCL(18)
   Cursor Pin uid 1976 efd 0 whr 1 slp 0
   opr=3 pso=0x3df82fa9f8 flg=0
   pcs=0x3c7e8a1388 nxt=0x3c7e897020 flg=34 cld=0 hd=0x3e5e2ea8f0 par=0x3c7e8a0ea0
   ct=c hsh=0 unp=(nil) unn=0 hvl=7e897d28 nhv=1 ses=0x3dc221a830
   hep=0x3c7e8a1420 flg=80 ld=1 ob=0x3c7f36d0b0 ptr=0x3024187ea8 fex=0x3024187268

load时间与故障时间吻合,并且长时间的X模式持有cursor pin mutex和lc pin,这里可以看到heap 6占用了大量的空间,不由得想去看看这个sql的执行计划。

Plan hash value: 3616047354


| Id | Operation | Name |

| 0 | SELECT STATEMENT | |
| 1 | CONCATENATION | |
| 2 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 3 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 4 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 5 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 6 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 7 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 8 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 9 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 10 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 11 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 12 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 13 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 14 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |


|1995 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|1996 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|1997 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|1998 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|1999 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|2000 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|2001 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |

看了吓一跳,首先优化器选用的RBO,很简单的sql,但是in里面变量达到1000个,正好还走了or展开。1000个变量意味着有1000个or分支,难怪heap 6那么大。

看看blocker 的stack信息:

SO: 0x3ec1e41a68, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3ec1e41a68, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:178, ser:231, calls cur/top: 0x3d6b429b98/0x3d6b514a00
flags : (0x0) –
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 160
last post received-location: kcb2.h LINE:4243 ID:kcbzww
last process to post me: 0x3ec1e57980 1 0
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
O/S info: user: oracle, term: UNKNOWN, ospid: 301884
OSD pid info: Unix process pid: 301884, image: oracle@FMS11DG
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+246<-kghalp()+94<-qcopCreateLog()+54<-qkebCreateUnaryLog()+43<-apaclg()+205<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apasor()+281<-apaqba()+1122<-apaqbdDescendents()+496<-apaqbdList()+76<-apaqbd()+14<-apadrv()+922<-opitca()+2089<-kksFullTypeCheck()+69<-rpiswu2()+1776<-kksSetBindType()+2299<-kksfbc()+11192<-opiexe()+2330<-kpoal8()+2380<-opiodr()+917<-ttcpip()+1255<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
—————————————-

大量的循环了apaclg函数。  (apa)clg – SQL Access Path Analysis ??

### 3,解决方案

分析到此基本可以给出建议了。3个建议同时调整:

减少in变量为100个
使用all_rows hint让执行计划走inlist
keep sql执行计划




------------------作者介绍-----------------------
姓名:黄廷忠
现就职:Oracle中国高级服务团队
曾就职:OceanBase、云和恩墨、东方龙马等
电话、微信、QQ:18081072613
![联系二维码](http://htzaliyun.cdhtz.com/uPic/Y44TWC.png)

------------------------------------------------

故障诊断:备库LGWR常见的library cache lock案例分析:等您坐沙发呢!

发表评论

gravatar

? razz sad evil ! smile oops grin eek shock ??? cool lol mad twisted roll wink idea arrow neutral cry mrgreen

快捷键:Ctrl+Enter