我们的文章会在微信公众号Oracle恢复实录和博客网站( www.htz.pw )同步更新 ,欢迎关注收藏,也欢迎大家转载,但是请在文章开始地方标注文章出处,谢谢!
由于博客中有大量代码,通过页面浏览效果更佳。
随着Oracle ADG(Active Data Guard)技术的不断成熟,越来越多的企业客户选择将只读类业务迁移到容灾环境(即ADG只读库)中运行。这样做不仅可以减轻主库的压力,还能更好地利用备库资源,提高整体系统的可用性和容错能力。然而,虽然ADG只读库和生产主库的数据是一致的,但由于两者在架构和实现机制上存在一些差异,导致很多客户在实际使用只读库时会遇到各种异常问题,影响了业务体验。
本文将结合实际案例,通俗易懂地介绍在使用ADG只读库时常见的一些“坑”,并分析背后的原因,帮助大家更好地理解和规避这些问题。
1,现象描述
本案例来至于于朋友分享,数据库版本为19c,故障现象为备库每天早上8点40左右,备库的LGWR都会被阻塞。从而其他应用因为请求不到instance lock后被LGWR阻塞,LGWR此时出现library cache lock等待事件。客户第二天重现的时候收集了systemstate dump,下面内容根据systemstate dump进行分析。
2,分析过程
LGWR PROCESS STATE OBJECT:
PROCESS 14: LGWR
----------------------------------------
SO: 0x3d21de9858, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3d21de9858, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:14, ser:2, calls cur/top: 0x3ec22fda30/0x3ec22fda30
flags : (0x6) SYSTEM
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 26
last post received-location: ksa2.h LINE:285 ID:ksasnd
last process to post me: 0x3da1e98788 1 0
last post sent: 0 0 259
last post sent-location: kgl.h LINE:8873 ID:kgllkdl: post after freeing latch
last process posted by me: 0x3da1eb2980 171 0
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
O/S info: user: oracle, term: UNKNOWN, ospid: 172824
OSD pid info: Unix process pid: 172824, image: oracle@FMS11DG (LGWR)
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2066<-kslwaitctx()+163<-ksfwaitctx()+14<-kglLockWait()+871<-kgllkal()+1166<-kglLock()+1118<-kglLockInstance()+259<-kksExclusiveParseLock()+47<-krdrsb_mdflush_local()+126<-krdrsb_adv_qscn()+1652<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
----------------------------------------
LGWR SESSION STATE OBJECT:
SO: 0x3d21f74f20, type: 4, owner: 0x3d21de9858, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3d21de9858, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 337 ser: 3 trans: (nil), creator: 0x3d21de9858
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 0 oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'library cache lock'
handle address=0x3e5ff630f8, lock address=0x3e7ff41450, 100*mode+namespace=0x1004a0003
wait_id=9628259 seq_num=60150 snap_id=1
wait times: snap=1.306065 sec, exc=1.306065 sec, total=1.306065 sec
wait times: max=5.500000 sec, heur=1 min 8 sec
wait counts: calls=2 os=2
in_wait=1 iflags=0x15a2
2.1 确认阻塞者
LGWR SID为337,等待library cache lock,namespace一看就是instance lock,请求模式为X。
这里偷了个懒,很碰巧看到了final blocking session,就省的去搜索哪个session持有了instance lock了。
SO: 0x3d42261708, type: 4, owner: 0x3da1eae6a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3da1eae6a0, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 2045 ser: 1 trans: (nil), creator: 0x3da1eae6a0
flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 2605814 oct: 3, prv: 0, sql: 0x3e9fb65bd0, psql: 0x3e5e5b1860, user: 141/DBAAS_SYS
ksuxds FALSE at location: 0
service name: fmsdg
client details:
O/S info: user: zcloud, term: , ospid: 7395
machine: VCOM01PXY program: zoramon_collector@VCOM01PXY (TNS V1-V3)
application name: zoramon_collector@VCOM01PXY (TNS V1-V3), hash value=2183771371
Current Wait Stack:
0: waiting for 'library cache lock'
handle address=0x3e5ff630f8, lock address=0x3e1fbb8dd0, 100*mode+namespace=0x1004a0002
wait_id=5523383 seq_num=18444 snap_id=1
wait times: snap=1.336710 sec, exc=1.336710 sec, total=1.336710 sec
wait times: max=15 min 0 sec, heur=1.336710 sec
wait counts: calls=2 os=2
in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 337, ser: 3
Dumping final blocker:
inst: 1, sid: 1976, ser: 29501
final blocker为session 1976
final blocker session state object:
SO: 0x3dc221a830, type: 4, owner: 0x3ec1e41a68, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3ec1e41a68, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 1976 ser: 29501 trans: (nil), creator: 0x3ec1e41a68
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 2605814 oct: 3, prv: 0, sql: 0x3e5e197d50, psql: 0x3e9f487090, user: 142/FIX_DBQUERY
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: afaim, term: unknown, ospid: 1234
machine: VFIX02APP program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
Not in wait; last wait ended 1 min 16 sec ago
There are 2 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 337, ser: 3
wait event: 'library cache lock'
p1: 'handle address'=0x3e5ff630f8
p2: 'lock address'=0x3e7ff41450
p3: '100*mode+namespace'=0x1004a0003
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 0 secs, waiter_cache_ver: 24873
final blocker没有任何等待,正在执行的sql为:
SO: 0x3df82faf90, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0
LibraryObjectLock: Address=0x3df82faf90 Handle=0x3e5e197d50 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0
User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=6552c1ec
LibraryHandle: Address=0x3e5e197d50 Hash=3584acbe LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=select
s.VC_INTER_CODE,
s.C_MARKET_NO,
s.L_BUY_UNIT,
s.VC_TRADE_CURRENCY_NO,
s.EN_YESTERDAY_CLOSE_PRICE,
s.VC_MIXED_TYPE,
s.C_STOCK_TYPE,
s.C_ASSET_CLASS,
s.C_STOP_FLAG,
s.VC_REPORT_CODE
from trade.TSTOCKINFO s
WHERE s.VC_INTER_CODE in (
:1
,
:2
,
:3
,
:4
FullHashValue=dbbe27bcd13925c46366f2fc3584acbe Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=897887422 OwnerIdn=142
Statistics: InvalidationCount=12 ExecutionCount=2804 LoadCount=14 ActiveLocks=1 TotalLockCount=58 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=57 HandleInUse=57 HandleReferenceCount=0
Concurrency: DependencyMutex=0x3e5e197e00(0, 135, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=0x3e5e197de0[0x3e5e197de0,0x3e5e197de0]
Pin=0x3e5e197dc0[0x3e5e197dc0,0x3e5e197dc0]
LoadLock=0x3e5e197e38[0x3e5e197e38,0x3e5e197e38]
Timestamp: Current=10-29-2023 08:40:12
HandleReference: Address=0x3e5e198110 Handle=(nil) Flags=[00]
LibraryObject: Address=0x3c7e8a0e00 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size='16'
Child: id='0' Table=0x3c7e8a1cb0 Reference=0x3c7e8a16f8 Handle=0x3e5e2ea8f0
Child: id='1' Table=0x3c7e8a1cb0 Reference=0x3c7e8972c0 Handle=0x3e5de512f0
NamespaceDump:
Parent Cursor: sql_id=66trkzhus9b5y parent=0x3c7e8a0ea0 maxchild=2 plk=y ppn=n
2.2 确认handle
有两个子游标handle,分别是:
Handle=0x3e5e2ea8f0
Handle=0x3e5de512f0
查看子游标lc state object:
```
SO: 0x3df82fa9f8, type: 78, owner: 0x3dc221a830, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3ec1e41a68, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0
LibraryObjectLock: Address=0x3df82fa9f8 Handle=0x3e5e2ea8f0 Mode=N CanBeBrokenCount=23 Incarnation=23 ExecutionCount=0
ClusterLock=0x3dfe9ca0a8 Context=0x7f59809d8030 User=0x3dc221a830 Session=0x3dc221a830 ReferenceCount=1
Flags=CBK/[0020] SavepointNum=0
LibraryHandle: Address=0x3e5e2ea8f0 Hash=0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD
Name: Namespace=SQL AREA(00) Type=CURSOR(00)
Statistics: InvalidationCount=11 ExecutionCount=0 LoadCount=13 ActiveLocks=1 TotalLockCount=59 TotalPinCount=2652
Counters: BrokenCount=23 RevocablePointer=23 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x3e5e2ea9a0(0, 0, 0, 0) Mutex=0x3e5e197e90(0, 209907, 0, 0)
Flags=RON/PIN/PN0/EXP/CHD/[10012111]
WaitersLists:
Lock=0x3e5e2ea980[0x3e5e2ea980,0x3e5e2ea980]
Pin=0x3e5e2ea960[0x3e5e2ea960,0x3e5e2ea960]
LoadLock=0x3e5e2ea9d8[0x3e5e2ea9d8,0x3e5e2ea9d8]
LibraryObject: Address=0x3c7f36d0b0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
DataBlocks:
Block: #='0' name=KGLH0^3584acbe pins=0 Change=NONE
Heap=0x3e5ce16cb0 Pointer=0x3c7f36d150 Extent=0x3c7f36d030 Flags=I/-/P/A/-/-
FreedLocation=0 Alloc=64.843750 Size=67.601562 LoadTime=6923111190
Block: #='6' name=SQLA^3584acbe pins=0 Change=NONE
Heap=0x3c7e8a14c8 Pointer=0x3024187ea8 Extent=0x3024187268 Flags=I/-/P/A/-/E
FreedLocation=0 Alloc=69385.406250 Size=70401.898438 LoadTime=0
NamespaceDump:
Child Cursor: Heap0=0x3c7f36d150 Heap6=0x3024187ea8 Heap0 Load Time=11-14-2023 08:40:13 Heap6 Load Time=11-14-2023 08:40:13 ----------------------------------------
KGX Atomic Operation Log 0x3dfe9ca0a8
Mutex 0x3c7e8a1420(1976, 0) idn 3584acbe oper LONG_EXCL(18)
Cursor Pin uid 1976 efd 0 whr 1 slp 0
opr=3 pso=0x3df82fa9f8 flg=0
pcs=0x3c7e8a1388 nxt=0x3c7e897020 flg=34 cld=0 hd=0x3e5e2ea8f0 par=0x3c7e8a0ea0
ct=c hsh=0 unp=(nil) unn=0 hvl=7e897d28 nhv=1 ses=0x3dc221a830
hep=0x3c7e8a1420 flg=80 ld=1 ob=0x3c7f36d0b0 ptr=0x3024187ea8 fex=0x3024187268
load时间与故障时间吻合,并且长时间的X模式持有cursor pin mutex和lc pin,这里可以看到heap 6占用了大量的空间,不由得想去看看这个sql的执行计划。
Plan hash value: 3616047354
| Id | Operation | Name |
| 1 | CONCATENATION | |
| 2 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 3 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 4 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 5 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 6 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 7 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 8 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 9 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 10 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 11 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 12 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|* 13 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 14 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
…
…
|1995 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|1996 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|1997 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|1998 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|1999 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
|2000 | TABLE ACCESS BY INDEX ROWID| TSTOCKINFO |
|2001 | INDEX UNIQUE SCAN | PK_TSTOCKINFO |
| 0 | SELECT STATEMENT | |看了吓一跳,首先优化器选用的RBO,很简单的sql,但是in里面变量达到1000个,正好还走了or展开。1000个变量意味着有1000个or分支,难怪heap 6那么大。
看看blocker 的stack信息:
SO: 0x3ec1e41a68, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3ec1e41a68, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:178, ser:231, calls cur/top: 0x3d6b429b98/0x3d6b514a00
flags : (0x0) –
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 160
last post received-location: kcb2.h LINE:4243 ID:kcbzww
last process to post me: 0x3ec1e57980 1 0
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x3d21ec8180
O/S info: user: oracle, term: UNKNOWN, ospid: 301884
OSD pid info: Unix process pid: 301884, image: oracle@FMS11DG
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kghalf()+246<-kghalp()+94<-qcopCreateLog()+54<-qkebCreateUnaryLog()+43<-apaclg()+205<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apaclg()+238<-apasor()+281<-apaqba()+1122<-apaqbdDescendents()+496<-apaqbdList()+76<-apaqbd()+14<-apadrv()+922<-opitca()+2089<-kksFullTypeCheck()+69<-rpiswu2()+1776<-kksSetBindType()+2299<-kksfbc()+11192<-opiexe()+2330<-kpoal8()+2380<-opiodr()+917<-ttcpip()+1255<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
—————————————-
大量的循环了apaclg函数。 (apa)clg – SQL Access Path Analysis ??
### 3,解决方案
分析到此基本可以给出建议了。3个建议同时调整:
减少in变量为100个
使用all_rows hint让执行计划走inlist
keep sql执行计划
------------------作者介绍-----------------------
姓名:黄廷忠
现就职:Oracle中国高级服务团队
曾就职:OceanBase、云和恩墨、东方龙马等
电话、微信、QQ:18081072613

------------------------------------------------
故障诊断:备库LGWR常见的library cache lock案例分析:等您坐沙发呢!