当前位置: 首页 > ORA, RAC > 正文

votedisk丢失导致主机重启

下面是模拟其中一个节点VOTEDISK磁盘丢失导致主机重启

1,环境介绍

[root@cisser2 ~]# crsctl query crs activeversion

CRS active version on the cluster is [10.2.0.5.0]

[root@cisser2 ~]# lsb_release -a

LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch

Distributor ID: RedHatEnterpriseServer

Description:    Red Hat Enterprise Linux Server release 5.11 (Tikanga)

Release:        5.11

Codename:       Tikanga

2,查看磁盘信息

[root@cisser1 tmp]# dmsetup ls

disk1_votep1    (253, 5)

disk1_vote      (253, 2)

disk1_ocr       (253, 3)

VolGroup00-LogVol01     (253, 0)

disk1_data1     (253, 4)

VolGroup00-LogVol00     (253, 1)

disk1_ocrp1     (253, 6)

 

[root@cisser1 tmp]# raw -qa

/dev/raw/raw1:  bound to major 253, minor 6

/dev/raw/raw4:  bound to major 253, minor 5

 

[root@cisser1 ~]# crsctl query css votedisk

 0.     0    /dev/raw/raw4

 

located 1 votedisk(s).

 

 

[root@cisser1 tmp]# multipath -ll

disk1_vote (36000c291eaeb9a8cb897fed3bb029eb7) dm-2 VMware,,VMware Virtual

[size=307M][features=0][hwhandler=0][rw]

\_ round-robin 0 [prio=1][active]

 \_ 2:0:0:0 sdb 8:16  [active][ready]

disk1_ocr (36000c293ecddddd9af5f396457322054) dm-3 VMware,,VMware Virtual

[size=307M][features=0][hwhandler=0][rw]

\_ round-robin 0 [prio=1][active]

 \_ 2:0:1:0 sdc 8:32  [active][ready]

disk1_data1 (36000c294078123daee865a29e3b1ea63) dm-4 VMware,,VMware Virtual

[size=50G][features=0][hwhandler=0][rw]

\_ round-robin 0 [prio=1][active]

 \_ 2:0:2:0 sdd 8:48  [active][ready]

这里可以看到/dev/raw/raw4votedisk对应多路径磁盘/dev/dm-2别名是disk1_vote,对应磁盘名是sdb

3,删除sdb磁盘

 [root@cisser1 device]# pwd

/sys/block/sdb/device

[root@cisser1 device]# ls -l delete

–w——- 1 root root 4096 Mar 29 11:46 delete

 

 

[root@cisser1 dm-2]# echo 1 > /sys/block/sdb/device/delete

[root@cisser1 dm-2]# multipath -ll

disk1_vote (36000c291eaeb9a8cb897fed3bb029eb7) dm-2 ,

[size=307M][features=0][hwhandler=0][rw]

\_ round-robin 0 [prio=0][enabled]

 \_ #:#:#:# –   #:#   [failed][faulty]这里可以看到路径丢失

4,查看节点1cssd日志信息

 [root@cisser1 cssd]# tail -f ocssd.log

 从这个时候开始报磁盘错误

[    CSSD]2015-03-29 13:16:35.485 [843671872] >ERROR:   Internal Error Information:

  Category: 1234

  Operation: scls_block_read

  Location: fread_failed

  Other: fread unable to read buffer

  Dep: 5

  …………………………

[    CSSD]2015-03-29 13:16:35.485 [843671872] >ERROR:   clssnmvReadBlocks: read failed 1 at offset 529 of /dev/raw/raw4

[    CSSD]2015-03-29 13:16:35.485 [843671872] >TRACE:   clssnmDiskStateChange: state from 4 to 3 disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:16:35.485 [832436544] >ERROR:   Internal Error Information:

  Category: 1234

  Operation: scls_block_write

  Location: fwrite_faile

  Other: fwrite unable to write buffer

  Dep: 5

 

12次出现下面日志

[    CSSD]2015-03-29 13:19:26.604 [832436544] >ERROR:   Internal Error Information:

  Category: 1234

  Operation: scls_block_read

  Location: fread_failed

  Other: fread unable to read buffer

  Dep: 5

 

[    CSSD]2015-03-29 13:19:26.604 [832436544] >ERROR:   clssnmvReadBlocks: read failed 1 at offset 4 of /dev/raw/raw4

[    CSSD]2015-03-29 13:19:27.814 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:27.814 [1013770560] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:19:31.820 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:31.820 [1013770560] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes

[    CSSD]2015-03-29 13:19:35.615 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 19900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:36.615 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 18900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:36.825 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:36.825 [1013770560] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:19:37.617 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 17900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE:   clssgmDispatchCMXMSG(): msg type(3) src(2) dest(1) size(420) tag(01f7002a) incarnation(5)

[    CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE:   clssgmHandleMasterAdd(): src(2) dest(1) size(420)

[    CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE:   clssgmHandleMasterAdd(): grock(SRVM.DATABASE.NODEAPPS.cisser2) memberNo(-1) node(2) client(1f7002a) type(3).

[    CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE:   clssgmAddMember: granted member(0) flags(0x1) node(2) grock (0x8c66cb0/SRVM.DATABASE.NODEAPPS.cisser2)

[    CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE:   clssgmCommonAddMember: Remote member(0) node(2) flags 0x1 0x1 grock (3/0x8c66cb0/SRVM.DATABASE.NODEAPPS.cisser2)

[    CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE:   clssgmDispatchCMXMSG(): msg type(4) src(2) dest(1) size(352) tag(01f8002a) incarnation(5)

[    CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE:   clssgmHandleMasterExit(): src(2) dest(1) size(352)

[    CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE:   clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.cisser2) member(0/0x8c013b0) nodeNum(2) flags(0x1) type(3)

[    CSSD]2015-03-29 13:19:38.618 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 16900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:39.619 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 15890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:39.701 [992381248] >TRACE:   clssgmDispatchCMXMSG(): msg type(12) src(2) dest(1) size(360) tag(01f9002a) incarnation(5)

[    CSSD]2015-03-29 13:19:40.620 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 14890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:41.621 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 13890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:41.831 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:41.831 [1013770560] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:19:42.623 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 12890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:43.237 [950012224] >TRACE:   clssgmAllocProc: (0x8c23c60) allocated

[    CSSD]2015-03-29 13:19:43.238 [971401536] >TRACE:   Connect request from user root

[    CSSD]2015-03-29 13:19:43.238 [950012224] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8c013b0) proc(0x8c23c60) pid() proto(10:2:1:1)

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmRegisterClient: proc(17/0x8c23c60), client(1/0x8bd2110)

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmExecuteClientRequest: GRKJOIN recvd from client 1 (0x8bd2110)

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmJoinGrock: grock SRVM.DATABASE.NODEAPPS.cisser1 new client 0x8bd2110 with con 0x8bfdef0, requested num -1

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.cisser1

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmAddMember: granted member(0) flags(0x1) node(1) grock (0x8c64fa0/SRVM.DATABASE.NODEAPPS.cisser1)

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmQueueGrockEvent: lockName(SRVM.DATABASE.NODEAPPS.cisser1) type(2) count (1/1) xwaiters(0) event(1) to memberNo(0)

[    CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE:   clssgmCommonAddMember: Local member(0) node(1) flags 0x1 0x1 grock (3/0x8c64fa0/SRVM.DATABASE.NODEAPPS.cisser1)

[    CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE:   clssgmExecuteClientRequest: GRKEXIT recvd from client 1 (0x8bd2110)

[    CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE:   clssgmExitGrock: client 1 (0x8bd2110), grock SRVM.DATABASE.NODEAPPS.cisser1, member 0

[    CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE:   clssgmUnregisterClient(): removing proc 17 client 1, flags 0x04000000

[    CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE:   clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.cisser1) member(0/0x8c22fa0) nodeNum(1) flags(0x1) type(3)

[    CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE:   clssgmUnregisterClient: client 0x8bd2110 expiring

[    CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE:   clssgmDeadProc: proc 0x8c23c60

[    CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE:   clssgmDeleteClientListener: deleting cmProc (0x8c23c60), with 0 clients

[    CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE:   clssgmDeleteClientListener: cleanup for proc(0x8c23c60) con(0x8c013b0) pid()

[    CSSD]2015-03-29 13:19:43.625 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 11890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:44.627 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 10890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:45.616 [832436544] >ERROR:   Internal Error Information:

  Category: 1234

  Operation: scls_block_read

  Location: fread_failed

  Other: fread unable to read buffer

  Dep: 5

 

[    CSSD]2015-03-29 13:19:45.616 [832436544] >ERROR:   clssnmvReadBlocks: read failed 1 at offset 4 of /dev/raw/raw4

[    CSSD]2015-03-29 13:19:45.616 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 9900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:46.617 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 8900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:46.839 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:46.839 [1013770560] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:19:47.618 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 7900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:48.620 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 6900 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:49.621 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 5890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:50.623 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 4890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:50.845 [1013770560] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:19:50.845 [1013770560] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes

[    CSSD]2015-03-29 13:19:51.625 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 3890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:52.626 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 2890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:53.627 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 1890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:54.628 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 890 ms, disk (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:55.530 [854161728] >TRACE:   clssnmDiskPMT: offline disk (200010 ms) (0//dev/raw/raw4)

[    CSSD]2015-03-29 13:19:55.530 [854161728] >ERROR:   clssnmDiskPMT: Aborting, 1 of 1 voting disks unavailable

[    CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR:   ###################################

[    CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR:   clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread

[    CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR:   ###################################

[    CSSD]2015-03-29 13:19:55.533 [854161728] >TRACE:   clssgmDiscOmonReady: omon was posted for member 1

这里可以看到clssnmDiskPMT: Aborting, 1 of 1 voting disks unavailable,主机1由于不能访问VOTEDISK磁盘,CSSD进程被clssnmvDiskPingMonitorThread线程终止,导致主机重启。

5,主机2OCSSD日志

这里看到主机2收到节点1已经failure,后面开始出现50% heartbeat网络心跳丢失。

[    CSSD]2015-03-29 13:19:58.691 [1518352704] >TRACE:   clssgmPeerListener: discarded 0 future msgsfor 1

[    CSSD]2015-03-29 13:19:58.691 [1396734272] >WARNING: clssnmeventhndlr: Receive failure with node 1 (cisser1), state 3, con(0xf66b090), probe((nil)), rc=11

…………………………………

[    CSSD]2015-03-29 13:20:28.778 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 29.200 seconds seedhbimpd 0

[    CSSD]2015-03-29 13:20:28.778 [1529252160] >TRACE:   clssnmPollingThread: node cisser1 (1) is impending reconfig, flag 1, misstime 30800

[    CSSD]2015-03-29 13:20:28.778 [1529252160] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2015-03-29 13:20:32.905 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:32.905 [1539742016] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:20:36.910 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:36.910 [1539742016] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes

[    CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE:   clssgmAllocateRPCIndex: allocated rpc 507 (0x2abe50b1cd90)

[    CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE:   clssgmpeersend: send failed type 12, node 1, unreachable, flags 0x0, quiesced 0

[    CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE:   clssgmFreeRPCIndex: freeing rpc 507

[    CSSD]2015-03-29 13:20:41.917 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:41.917 [1539742016] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:20:43.780 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 14.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:45.923 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:45.923 [1539742016] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes

[    CSSD]2015-03-29 13:20:49.930 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:49.930 [1539742016] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes

[    CSSD]2015-03-29 13:20:52.784 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 5.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:53.785 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 4.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:54.787 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 3.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:54.938 [1539742016] >TRACE:   clssnmSendingThread: sending status msg to all nodes

[    CSSD]2015-03-29 13:20:54.938 [1539742016] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes

[    CSSD]2015-03-29 13:20:55.788 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 2.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:56.790 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 1.200 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:57.791 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 0.190 seconds seedhbimpd 1

[    CSSD]2015-03-29 13:20:57.984 [1529252160] >TRACE:   clssnmPollingThread: Eviction started for node cisser1 (1), flags 0x0001, state 3, wt4c 0 seedhbimpd 1

node22015-03-29 13:20:57.984开始踢节点,其实这个主机1已经重启了,是由于节点1重启导致网络丢失,所以菜出现了驱除节点的提示。

本文固定链接: http://www.htz.pw/2015/03/29/votedisk%e4%b8%a2%e5%a4%b1%e5%af%bc%e8%87%b4%e4%b8%bb%e6%9c%ba%e9%87%8d%e5%90%af.html | 认真就输

该日志由 huangtingzhong 于2015年03月29日发表在 ORA, RAC 分类下, 通告目前不可用,你可以至底部留下评论。
原创文章转载请注明: votedisk丢失导致主机重启 | 认真就输
关键字: