1,环境介绍
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/oracle/product/11.2.0 System name: AIX 数据库的PSU为11.2.0.4.2 |
2,对ASM实例做HANGANALYZE与SYSTEMSTATE
这里对ASM实例做下面的操作不会生成很多的TRACE文件。
oradebug setmypid oradebug unlimit oradebug hanganalyze 3 oradebug dump systemstate 266 |
3,KILL MRP进程
通过recover managed standby database cancel;不能正常取消,只能通过手动KILL MRP进程,在执行RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;后正常运用日志。
4,分析HANGANALYZE与SYSTEMSTATE日志
4.1 查看hanganalyze部分
*** 2015-03-31 20:09:59.453 =============================================================================== HANG ANALYSIS: instances (db_name.oracle_sid): cdhtz_p780.cdhtz1 oradebug_node_dump_level: 3 os thread scheduling delay history: (sampling every 1.000000 secs) 0.000000 secs at [ 20:09:58 ] NOTE: scheduling delay has not been sampled for 0.818740 secs 0.000000 secs from [ 20:09:54 – 20:09:59 ], 5 sec avg 0.000000 secs from [ 20:09:00 – 20:09:59 ], 1 min avg 0.000000 secs from [ 20:05:00 – 20:09:59 ], 5 min avg vktm time drift history ===============================================================================
Chains most likely to have caused the hang: [a] Chain 1 Signature: ‘gc buffer busy release'<=’buffer busy waits’ Chain 1 Signature Hash: 0xe473997a [b] Chain 2 Signature: ‘parallel recovery slave next change'<=’buffer busy waits’ Chain 2 Signature Hash: 0x16abd035 [c] Chain 3 Signature: ‘gc buffer busy release'<=’buffer busy waits’ Chain 3 Signature Hash: 0xe473997a
=============================================================================== Non-intersecting chains:
——————————————————————————- Chain 1: ——————————————————————————- Oracle session identified by: { instance: 1 (cdhtz_p780.cdhtz1) os id: 6226628 process id: 98, oracle@cdhtzrzdb1 session id: 255 session serial #: 451 } is waiting for ‘buffer busy waits’ with wait info: { p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 time in wait: 0.944766 sec (last interval) time in wait: 76 min 54 sec (total) timeout after: never wait id: 135153 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.000000 sec 1. event: ‘latch: cache buffers chains’ time waited: 0.000017 sec wait id: 152253 p1: ‘address’=0x7000103deef8f58 p2: ‘number’=0xb1 p3: ‘tries’=0x0 * time between wait #1 and #2: 0.000000 sec 2. event: ‘buffer busy waits’ time waited: 4.345775 sec (last interval) time waited: 76 min 53 sec (total) wait id: 135153 p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 * time between wait #2 and #3: 0.000000 sec 3. event: ‘latch: cache buffers chains’ time waited: 0.000012 sec wait id: 152252 p1: ‘address’=0x7000103deef8f58 p2: ‘number’=0xb1 p3: ‘tries’=0x0 } and is blocked by => Oracle session identified by: { instance: 1 (cdhtz_p780.cdhtz1) os id: 5767982 process id: 77, oracle@cdhtzrzdb1 (PR0E) session id: 633 session serial #: 51 } which is waiting for ‘gc buffer busy release’ with wait info: { p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 time in wait: 0.923282 sec heur. time in wait: 24 min 49 sec timeout after: 0.076718 sec wait id: 258197465 blocking: 2 sessions wait history: * time between current wait and wait #1: 0.000021 sec 1. event: ‘gc buffer busy release’ time waited: 1.000014 sec wait id: 258197464 p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 * time between wait #1 and #2: 0.000012 sec 2. event: ‘gc buffer busy release’ time waited: 1.000015 sec wait id: 258197463 p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 * time between wait #2 and #3: 0.000021 sec 3. event: ‘gc buffer busy release’ time waited: 1.000039 sec wait id: 258197462 p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 }
Chain 1 Signature: ‘gc buffer busy release'<=’buffer busy waits’ Chain 1 Signature Hash: 0xe473997a ——————————————————————————-
——————————————————————————- Chain 2: ——————————————————————————- Oracle session identified by: { instance: 1 (cdhtz_p780.cdhtz1) os id: 5963976 process id: 89, oracle@cdhtzrzdb1 session id: 2148 session serial #: 1613 } is waiting for ‘buffer busy waits’ with wait info: { p1: ‘file#’=0x3 p2: ‘block#’=0x80 p3: ‘class#’=0x11 time in wait: 67 min 44 sec timeout after: never wait id: 2843 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.000077 sec 1. event: ‘db file sequential read’ time waited: 0.006262 sec wait id: 2842 p1: ‘file#’=0x2 p2: ‘block#’=0x2202 p3: ‘blocks’=0x1 * time between wait #1 and #2: 0.000098 sec 2. event: ‘db file sequential read’ time waited: 0.000507 sec wait id: 2841 p1: ‘file#’=0x2 p2: ‘block#’=0x21ea p3: ‘blocks’=0x1 * time between wait #2 and #3: 0.000066 sec 3. event: ‘db file sequential read’ time waited: 0.000380 sec wait id: 2840 p1: ‘file#’=0x2 p2: ‘block#’=0x21e2 p3: ‘blocks’=0x1 } and is blocked by => Oracle session identified by: { instance: 1 (cdhtz_p780.cdhtz1) os id: 8127092 process id: 84, oracle@cdhtzrzdb1 (PR0L) session id: 1516 session serial #: 37 } which is waiting for ‘parallel recovery slave next change’ with wait info: { time in wait: 0.008511 sec heur. time in wait: 76 min 57 sec timeout after: 0.001489 sec wait id: 258195905 blocking: 1 session wait history: * time between current wait and wait #1: 0.000008 sec 1. event: ‘parallel recovery slave next change’ time waited: 0.010020 sec wait id: 258195904 * time between wait #1 and #2: 0.000004 sec 2. event: ‘parallel recovery slave next change’ time waited: 0.010013 sec wait id: 258195903 * time between wait #2 and #3: 0.000005 sec 3. event: ‘parallel recovery slave next change’ time waited: 0.010016 sec wait id: 258195902 }
Chain 2 Signature: ‘parallel recovery slave next change'<=’buffer busy waits’ Chain 2 Signature Hash: 0x16abd035 ——————————————————————————-
=============================================================================== Intersecting chains:
——————————————————————————- Chain 3: ——————————————————————————- Oracle session identified by: { instance: 1 (cdhtz_p780.cdhtz1) os id: 3146372 process id: 92, oracle@cdhtzrzdb1 session id: 2524 session serial #: 391 } is waiting for ‘buffer busy waits’ with wait info: { p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 time in wait: 7.613869 sec (last interval) time in wait: 76 min 50 sec (total) timeout after: never wait id: 109934 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.000000 sec 1. event: ‘latch: cache buffers chains’ time waited: 0.000003 sec wait id: 126021 p1: ‘address’=0x7000103deef8f58 p2: ‘number’=0xb1 p3: ‘tries’=0x0 * time between wait #1 and #2: 0.000000 sec 2. event: ‘buffer busy waits’ time waited: 3.531329 sec (last interval) time waited: 76 min 42 sec (total) wait id: 109934 p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 * time between wait #2 and #3: 0.000000 sec 3. event: ‘latch: cache buffers chains’ time waited: 0.000018 sec wait id: 126020 p1: ‘address’=0x7000103deef8f58 p2: ‘number’=0xb1 p3: ‘tries’=0x0 } and is blocked by ‘instance: 1, os id: 5767982, session id: 633’, which is a member of ‘Chain 1’.
Chain 3 Signature: ‘gc buffer busy release'<=’buffer busy waits’ Chain 3 Signature Hash: 0xe473997a ——————————————————————————- |
这里可以连接到数据库进程的阻塞关系。
4.2 使用ass格式化systemstate
[oracle@www.cdhtz.com ]$ awk -f ~/rs/sql/ass1045.awk cdhtz_ora_5046602.trc
Starting Systemstate 1 …………………………………………………………………… …………….. Ass.Awk Version 1.0.45 ~~~~~~~~~~~~~~~~~~~~~~ Source file : cdhtz_ora_5046602.trc
System State 1 (2015-03-31 20:10:07.375) ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~ 1: [DEAD] 2: waiting for ‘pmon timer’ 3: waiting for ‘rdbms ipc message’ 4: waiting for ‘VKTM Logical Idle Wait’ 5: waiting for ‘rdbms ipc message’ 6: waiting for ‘DIAG idle wait’ 7: waiting for ‘rdbms ipc message’ 8: waiting for ‘PING’ 9: waiting for ‘rdbms ipc message’ 10: waiting for ‘DIAG idle wait’ 11: waiting for ‘rdbms ipc message’ 12: waiting for ‘ges remote message’ 13: waiting for ‘gcs remote message’ 14: waiting for ‘gcs remote message’ 15: waiting for ‘rdbms ipc message’ 16: waiting for ‘GCR sleep’ 17: waiting for ‘rdbms ipc message’ 18: waiting for ‘rdbms ipc message’ 19: waiting for ‘rdbms ipc message’ 20: waiting for ‘rdbms ipc message’ 21: waiting for ‘rdbms ipc message’ 22: waiting for ‘rdbms ipc message’ 23: waiting for ‘smon timer’ 24: waiting for ‘rdbms ipc message’ 25: waiting for ‘rdbms ipc message’ 26: waiting for ‘ASM background timer’ 27: waiting for ‘rdbms ipc message’ 28: waiting for ‘rdbms ipc message’ 29: waiting for ‘RFS dispatch’ 30: waiting for ‘SQL*Net message from client’ 31: waiting for ‘wait for unread message on broadcast channel’ 32: waiting for ‘SQL*Net message from client’ 33: waiting for ‘rdbms ipc message’ 34: waiting for ‘rdbms ipc message’ 35: waiting for ‘rdbms ipc message’ 36: waiting for ‘class slave wait’ 37: waiting for ‘SQL*Net message from client’ 38: 39: waiting for ‘rdbms ipc message’ 40: waiting for ‘rdbms ipc message’ 41: waiting for ‘rdbms ipc message’ 42: waiting for ‘rdbms ipc message’ 43: waiting for ‘class slave wait’ 44: waiting for ‘rdbms ipc message’ 45: waiting for ‘rdbms ipc message’ 46: waiting for ‘SQL*Net message from client’ 47: waiting for ‘class slave wait’ 48: waiting for ‘class slave wait’ 49: 50: 51: waiting for ‘rdbms ipc message’ 52: waiting for ‘SQL*Net message from client’ 53: waiting for ‘RFS dispatch’ 54: waiting for ‘SQL*Net message from client’ 55: waiting for ‘SQL*Net message from client’ 56: waiting for ‘class slave wait’ 57: waiting for ‘MRP inactivation’ 58: waiting for ‘SQL*Net message from client’ 59: 60: waiting for ‘class slave wait’ 61: waiting for ‘parallel recovery slave next change’ 62: 63: waiting for ‘parallel recovery slave next change’ 64: waiting for ‘parallel recovery slave next change’ 65: waiting for ‘parallel recovery slave next change’ 66: waiting for ‘parallel recovery slave next change’ 67: waiting for ‘parallel recovery slave next change’ 68: waiting for ‘class slave wait’ 69: waiting for ‘class slave wait’ 70: waiting for ‘parallel recovery slave next change’ 71: waiting for ‘parallel recovery slave next change’ 72: waiting for ‘parallel recovery slave next change’ 73: waiting for ‘parallel recovery slave next change’ 74: waiting for ‘parallel recovery slave next change’ 75: waiting for ‘parallel recovery slave next change’ 76: waiting for ‘parallel recovery slave next change’ 77: waiting for ‘gc buffer busy release’ 78: waiting for ‘parallel recovery slave next change’ 79: waiting for ‘parallel recovery slave next change’ 80: waiting for ‘parallel recovery slave next change’ 81: waiting for ‘parallel recovery slave next change’ 82: waiting for ‘parallel recovery slave next change’ 83: waiting for ‘parallel recovery slave next change’ 84: waiting for ‘parallel recovery slave next change’ 85: waiting for ‘parallel recovery slave next change’ 86: waiting for ‘parallel recovery slave next change’ 87: waiting for ‘parallel recovery slave next change’ 89: waiting for ‘buffer busy waits’ (0x3,0x80,0x11)[Buffer 0x00c00080] Final Blocker: inst: 1, sid: 1516, ser: 37 91: 92: waiting for ‘buffer busy waits’ (0x7,0x229d,0x62)[Buffer 0x01c0229d] Final Blocker: inst: 1, sid: 633, ser: 51 93: waiting for ‘SQL*Net message from client’ 96: waiting for ‘direct path read’ 97: waiting for ‘SQL*Net message from client’ 98: waiting for ‘buffer busy waits’ (0x7,0x229d,0x62)[Buffer 0x01c0229d] Final Blocker: inst: 1, sid: 633, ser: 51 99: waiting for ‘direct path read’
Blockers ~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource then it will be given in square brackets. Below is a summary of the waited upon resources, together with the holder of that resource. Notes: ~~~~~ o A process id of ‘???’ implies that the holder was not found in the systemstate. (The holder may have released the resource before we dumped the state object tree of the blocking process). o Lines with ‘Enqueue conversion’ below can be ignored *unless* other sessions are waiting on that resource too. For more, see http://gbr30026.uk.oracle.com:81/Public/TOOLS/Ass.html#enqcnv)
Resource Holder State Buffer 0x00c00080 ??? Blocker Buffer 0x01c0229d ??? Blocker
Warning: The following processes have multiple session state objects and may not be properly represented above : 29: 53:
Object Names ~~~~~~~~~~~~ Buffer 0x00c00080 Buffer 0x01c0229d
Summary of Wait Events Seen (count>10) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ No wait events seen more than 10 times |
4.3 查询gc buffer busy release会话信息
在ue中直接搜索就可以了
ROCESS 77: PR0E —————————————- SO: 0x7000103e48c42a0, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x7000103e48c42a0, name=process, file=ksu.h LINE:12721 ID:, pg=0 (process) Oracle pid:77, ser:1, calls cur/top: 0x70001039b0808d0/0x70001039b0805f0 flags : (0x0) – flags2: (0x30), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 704768 0 2 last post received-location: ksl2.h LINE:2374 ID:kslpsr last process to post me: 0x7000103ea81a930 1 6 last post sent: 0 0 160 last post sent-location: kcb2.h LINE:4243 ID:kcbzww last process posted by me: 0x7000103ea829260 5 0 (latch info) wait_event=704768 bits=0x0 Process Group: DEFAULT, pseudo proc: 0x7000103e29af078 O/S info: user: oracle, term: UNKNOWN, ospid: 5767982 OSD pid info: Unix process pid: 5767982, image: oracle@jklcrzdb1 (PR0E) Short stack dump: ksedsts()+240<-ksdxfstk()+44<-ksdxcb()+3432<-sspuser()+116<-__sighandler()<-thread_wait()+580<-sskgpwwait()+32<-skgpwwait()+180<-ksliwat()+10772<-kslwaitctx()+180<-kslwait()+112<-kclwlr()+644<-kcbzfc()+2708<-kcbr_media_apply()+1888<-krp_slave_apply()+344<-krp_slave_main()+1092<-ksvrdp()+1804<-opirip()+724<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+204<-__start()+112 —————————————- SO: 0x7000103eaaf1038, type: 4, owner: 0x7000103e48c42a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x7000103e48c42a0, name=session, file=ksu.h LINE:12729 ID:, pg=0 (session) sid: 633 ser: 51 trans: 0x0, creator: 0x7000103e48c42a0 flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x2080409) -/-/INC DID: , short-term DID: txn branch: 0x0 edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS ksuxds FALSE at location: 0 service name: SYS$USERS client details: O/S info: user: oracle, term: UNKNOWN, ospid: 5767982 machine: jklcrzdb1 program: oracle@jklcrzdb1 (PR0E) Current Wait Stack: 0: waiting for ‘gc buffer busy release’ file#=0x7, block#=0x229d, class#=0x62 wait_id=258197477 seq_num=55118 snap_id=1 wait times: snap=0.007627 sec, exc=0.007627 sec, total=0.007627 sec wait times: max=1.000000 sec, heur=24 min 59 sec wait counts: calls=1 os=1 in_wait=1 iflags=0x5a8 There are 2 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 255, ser: 451 wait event: ‘buffer busy waits’ p1: ‘file#’=0x7 p2: ‘block#’=0x229d p3: ‘class#’=0x62 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 2404 secs, waiter_cache_ver: 27635 Wait State: fixed_waits=0 flags=0x22 boundary=0x0/-1 |
下面查看一下块的信息,可以得到下面的信息
SO: 0x70001039b096df0, type: 3, owner: 0x7000103ea8281b8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x7000103ea8281b8, name=call, file=ksu.h LINE:12725 ID:, pg=0 (call) sess: cur 7000103f8ec4eb0, rec 0, usr 7000103f8ec4eb0; flg:20 fl2:1; depth:0 svpt(xcb:0x0 sptn:0x4286 uba: 0x00000000.0000.00) ksudlc FALSE at location: 0 —————————————- SO: 0x7000103d4e2c040, type: 38, owner: 0x70001039b096df0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1 proc=0x7000103ea8281b8, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0 (buffer) (CR) PR: 0x7000103ea8281b8 FLG: 0x0 lock rls: 0x700010029dc27d8, class bit: 0x0 cr[0]: sh[0]: kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d4e2c0c0] (WAITING) type: normal pin where: ktuwh27: kturbk, why: 0 BH (0x700010029dc27d8) file#: 7 rdba: 0x01c0229d (7/8861) class: 98 ba: 0x70001002896e000 set: 62 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0 dbwrid: 1 obj: -1 objn: -1 tsn: 2 afn: 7 hint: f hash: [0x7000103e7d29d30,0x700010231e8cbf0] lru: [0x700010237e08048,0x700010117dcdec8] lru-flags: on_auxiliary_list ckptq: [NULL] fileq: [0x700010143e35aa8,0x700010237e07f68] objq: [0x700010143e35bb0,0x700010237e08070] objaq: [0x700010237e08080,0x700010117dcdf00] use: [NULL] wait: [0x7000103d712cff0,0x7000103d4e2c0c0] st: MEDIA_RCV md: NULL rsop: 0x700010394bdde40 tch: 1 le: 0x7000102f3ed7608 rlscn: 0x0bac.feaeaa51 flags: down_grade_lock only_sequential_access block_written_once Waiting State Objects —————————————- SO: 0x7000103d712cf70, type: 38, owner: 0x70001039b0a69c0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1 proc=0x7000103ea829260, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0 (buffer) (CR) PR: 0x7000103ea829260 FLG: 0x0 lock rls: 0x700010029dc27d8, class bit: 0x0 cr[0]: sh[0]: kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d712cff0] (WAITING) type: normal pin where: ktuwh27: kturbk, why: 0 —————————————- SO: 0x7000103d4e2c040, type: 38, owner: 0x70001039b096df0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1 proc=0x7000103ea8281b8, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0 (buffer) (CR) PR: 0x7000103ea8281b8 FLG: 0x0 lock rls: 0x700010029dc27d8, class bit: 0x0 cr[0]: sh[0]: kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d4e2c0c0] (WAITING) type: normal pin where: ktuwh27: kturbk, why: 0
GLOBAL CACHE ELEMENT DUMP (address: 0x7000102f3ed7608): id1: 0x229d id2: 0x7 pkey: INVALID block: (7/8861) lock: X rls: 0x7 acq: 0x0 latch: 10 flags: 0x20 fair: 255 recovery: 0 fpin: ‘kclwh2’ bscn: 0xbac.feaeaa51 bctx: 0x0 write: 0 scan: 0x0 lcp: 0x0 lnk: [NULL] lch: [0x700010029dc2910,0x700010029dc2910] seq: 1054 hist: 54 113 238 180 113 238 180 143:0 208 352 32 197 48 121 45 299 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT: flg: 0x00280400 lflg: 0x4 state: MEDIA_RCV tsn: 2 tsh: 1 waiters: 2 addr: 0x700010029dc27d8 obj: INVALID cls: UNDO BLOCK bscn: 0xbac.feaeaa51 buffer tsn: 2 rdba: 0x01c0229d (7/8861) scn: 0x0bac.feaeaa51 seq: 0x09 flg: 0x04 tail: 0xaa510209 frmt: 0x02 chkval: 0x3bf8 type: 0x02=KTU UNDO BLOCK |
这里大概知道原因了。
5 处理办法
5.1 搜MOS发现如下BUG
Bug 17695685 : TRACKING BUG FOR LRG 5984174
Bug 17695685 Hang in Active Dataguard Database This note gives a brief overview of bug 17695685. The content was last updated on: 17-DEC-2014 Click here for details of each of the sections below. Affects:
Fixed:
Interim patches may be available for earlier versions – click here to check.
Description This bug is only relevant when using Real Application Clusters (RAC) Rediscovery: |
目前在AIX平台只有11.2.0.4.5下面有PATCH包,现在环境是11.2.0.4.2,需要先打PSU到11.2.0.4.5
5.2 关闭一个实例
可以先关闭一个实例就不会出现这个故障了。
5.3 配置参数
通过配置参数_fairness_threshold为0,但是会降低数据库的性能。
ADG模式下MRP进程不能应用日志,子进程出现gc buffer busy release等待:等您坐沙发呢!