下面来源与MOS文档:
Troubleshooting 10g and 11.1 Clusterware Reboots (Doc ID 265769.1)
下面是sun的案例
Edit the init.cssd file from the location in step 1, change the OPROCD startup line to a non-fatal startup:
Sun Example:
# in fatal mode we always will start OPROCD FATAL if [ $OPROCD_EXISTS ]; then $OPROCD start -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN $OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL fi
Change this to:
# in fatal mode we always will start OPROCD FATAL if [ $OPROCD_EXISTS ]; then $OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN $OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL fi
You could also combine this method with the ‘tracing system calls’ method for more debugging. |
下面是在我的linux上面测试
[root@cisser1 oprocd]# oprocd help
usage: oprocd [start | startInstall | stop | check | enableFatal| help | -?]
run [ -t | -m | -g | -f | -e] foreground startup -t <timeout> timeout in ms -m <margin> timout margin in ms -e <epsilon> clock skew epsilon in ms -g <groupName> group name to enable fatal -f fatal startup
start [-t | -m | -e] starts the daemon -t <timeout> timeout in ms -m <margin> timout margin in ms -e <epsilon> clock skew epsilon in ms
startInstall [ -t | -m | -g | – e] start process in install mode -t <timeout> timeout in ms -m <margin> timout margin in ms -e <epsilon> clock skew epsilon in ms -g <groupName> group name to enable fatal
enableFatal [ -t ] force install mode process to fatal -t <timeout> timeout for response in ms stop [ -t ] stops running daemon -t <timeout> timeout for response in ms check [ -t ] checks status of daemon -t <timeout> timeout for response in ms help this help information -? same as help above |
下面看看init.cssd日志
使用startInstall模式启动
# Backup the oprocd last gasp files if [ -f $OPROCDLGL ] ; then FILENAME=$OPROCDLGL.$UNIQUEDATE $MVF $OPROCDLGL "$FILENAME" fi
# Run oprocd synchronously and look for its status code cd $OPROCDIR
# startup the some diagnostic collection scripts if any StartDiagCollect;
$OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \ $OPROCD_DEFAULT_HISTOGRAM $FATALARG RC=$?
# shutdown diagnostic collection StopDiagCollect; |
看看日志显示
[root@cisser1 oprocd]# /etc/init.d/init.crs stop Shutting down Oracle Cluster Ready Services (CRS): Mar 27 09:33:33.060 | ERR | failed to connect to daemon, errno(111) Stopping resources. This could take several minutes. Error while stopping resources. Possible cause: CRSD is down. Shutdown has begun. The daemons should exit soon. [root@cisser1 oprocd]# /etc/init.d/init.crs start Startup will be queued to init within 30 seconds.
没有生产任何的日志。
[root@cisser1 oprocd]# ps -ef|grep init root 1 0 0 09:36 ? 00:00:01 init [5] root 3242 1 0 09:37 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run root 3243 1 0 09:37 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal root 3252 1 0 09:37 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run root 4083 3243 0 09:37 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon root 4136 3243 0 09:37 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemon root 22547 21136 0 09:39 pts/0 00:00:00 grep init [root@cisser1 oprocd]# ps -ef|grep oproc root 23775 21136 0 09:39 pts/0 00:00:00 grep oproc |
这里没有看到oprocd进程,也没有看到生成任何的日志。
取消-f选项
# startup the some diagnostic collection scripts if any StartDiagCollect;
$OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \ $OPROCD_DEFAULT_HISTOGRAM #$OPROCD_DEFAULT_HISTOGRAM $FATALARG RC=$? |
这里直接重启的主机
[root@cisser1 oprocd]# ls -lrt total 84 -rwxr–r– 1 root root 512 Mar 24 10:00 cisser1.oprocd.lgl.2015-03-25-00:15:29 -rw-r–r– 1 root root 770 Mar 24 15:57 cisser1.oprocd.log.2015-03-25-00:15:29 -rw-r–r– 1 root root 175 Mar 25 00:15 cisser1.oprocd.log.2015-03-26-11:35:24 -rwxr–r– 1 root root 512 Mar 25 00:15 cisser1.oprocd.lgl.2015-03-26-11:35:24 -rw-r–r– 1 root root 175 Mar 26 11:35 cisser1.oprocd.log.2015-03-26-19:05:17 -rwxr–r– 1 root root 512 Mar 26 11:35 cisser1.oprocd.lgl.2015-03-26-19:05:17 -rwxr–r– 1 root root 512 Mar 26 19:05 cisser1.oprocd.lgl.2015-03-26-19:16:22 -rw-r–r– 1 root root 304 Mar 26 19:15 cisser1.oprocd.log.2015-03-26-19:16:22 -rw-r–r– 1 root root 97 Mar 26 19:16 cisser1.oprocd.log.2015-03-26-19:24:47 -rwxr–r– 1 root root 512 Mar 26 19:16 cisser1.oprocd.lgl.2015-03-26-19:24:47 -rwxr–r– 1 root root 512 Mar 26 19:24 cisser1.oprocd.lgl.2015-03-26-19:30:17 -rw-r–r– 1 root root 226 Mar 26 19:29 cisser1.oprocd.log.2015-03-26-19:30:17 -rwxr–r– 1 root root 512 Mar 26 19:30 cisser1.oprocd.lgl.2015-03-26-19:31:53 -rw-r–r– 1 root root 304 Mar 26 19:31 cisser1.oprocd.log.2015-03-26-19:31:53 -rwxr–r– 1 root root 512 Mar 27 14:08 cisser1.oprocd.lgl.2015-03-27-14:11:11 -rw-r–r– 1 root root 164 Mar 27 14:09 cisser1.oprocd.log.2015-03-27-14:11:11 drwxrwx— 2 root root 4096 Mar 27 14:11 stop drwxrwx— 2 root root 4096 Mar 27 14:11 fatal -rw-r–r– 1 root root 97 Mar 27 14:11 cisser1.oprocd.log -rwxr–r– 1 root root 512 Mar 27 14:11 cisser1.oprocd.lgl drwxrwx— 2 root root 4096 Mar 27 14:11 check [root@cisser1 oprocd]# cat cisser1.oprocd.log Mar 27 14:11:11.602 | INF | monitoring started with timeout(1000), margin(500), skewTimeout(125)
[root@cisser1 oprocd]# ps -ef|grep oprocd root 4451 1 0 14:11 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin startInstall -t 1000 -m 500 root 5634 5504 0 14:14 pts/0 00:00:00 grep oprocd [root@cisser1 oprocd]# ps -ef|grep init root 1 0 0 14:10 ? 00:00:01 init [5] root 3244 1 0 14:11 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run root 3245 1 0 14:11 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal root 3257 1 0 14:11 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run root 4027 3245 0 14:11 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon root 4119 3245 0 14:11 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemon |
这里看到生成了日志文件,并且日志文件里面也有内容,但是日志里面没有fatal与noe fatal的标示符。
下面是默认情况
# startup the some diagnostic collection scripts if any StartDiagCollect;
$OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \ $OPROCD_DEFAULT_HISTOGRAM $FATALARG RC=$? |
重启主机
[root@cisser1 oprocd]# ps -ef|grep oprocd root 4069 3242 0 14:44 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd root 4445 4069 0 14:44 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin run -t 1000 -m 500 -f root 5234 5109 0 14:45 pts/0 00:00:00 grep oprocd [root@cisser1 oprocd]# ls -lrt total 92 -rwxr–r– 1 root root 512 Mar 24 10:00 cisser1.oprocd.lgl.2015-03-25-00:15:29 -rw-r–r– 1 root root 770 Mar 24 15:57 cisser1.oprocd.log.2015-03-25-00:15:29 -rw-r–r– 1 root root 175 Mar 25 00:15 cisser1.oprocd.log.2015-03-26-11:35:24 -rwxr–r– 1 root root 512 Mar 25 00:15 cisser1.oprocd.lgl.2015-03-26-11:35:24 -rw-r–r– 1 root root 175 Mar 26 11:35 cisser1.oprocd.log.2015-03-26-19:05:17 -rwxr–r– 1 root root 512 Mar 26 11:35 cisser1.oprocd.lgl.2015-03-26-19:05:17 -rwxr–r– 1 root root 512 Mar 26 19:05 cisser1.oprocd.lgl.2015-03-26-19:16:22 -rw-r–r– 1 root root 304 Mar 26 19:15 cisser1.oprocd.log.2015-03-26-19:16:22 -rw-r–r– 1 root root 97 Mar 26 19:16 cisser1.oprocd.log.2015-03-26-19:24:47 -rwxr–r– 1 root root 512 Mar 26 19:16 cisser1.oprocd.lgl.2015-03-26-19:24:47 -rwxr–r– 1 root root 512 Mar 26 19:24 cisser1.oprocd.lgl.2015-03-26-19:30:17 -rw-r–r– 1 root root 226 Mar 26 19:29 cisser1.oprocd.log.2015-03-26-19:30:17 -rwxr–r– 1 root root 512 Mar 26 19:30 cisser1.oprocd.lgl.2015-03-26-19:31:53 -rw-r–r– 1 root root 304 Mar 26 19:31 cisser1.oprocd.log.2015-03-26-19:31:53 -rwxr–r– 1 root root 512 Mar 27 14:08 cisser1.oprocd.lgl.2015-03-27-14:11:11 -rw-r–r– 1 root root 164 Mar 27 14:09 cisser1.oprocd.log.2015-03-27-14:11:11 -rwxr–r– 1 root root 512 Mar 27 14:11 cisser1.oprocd.lgl.2015-03-27-14:44:46 -rw-r–r– 1 root root 164 Mar 27 14:43 cisser1.oprocd.log.2015-03-27-14:44:46 drwxrwx— 2 root root 4096 Mar 27 14:44 stop drwxrwx— 2 root root 4096 Mar 27 14:44 fatal -rw-r–r– 1 root root 175 Mar 27 14:44 cisser1.oprocd.log -rwxr–r– 1 root root 512 Mar 27 14:44 cisser1.oprocd.lgl drwxrwx— 2 root root 4096 Mar 27 14:44 check [root@cisser1 oprocd]# cat cisser1.oprocd.log Mar 27 14:44:46.836 | INF | monitoring started with timeout(1000), margin(500), skewTimeout(125) Mar 27 14:44:46.837 | INF | fatal mode startup, setting process to fatal mode |
这里明确的给出了是在fatal模式。
oprocd进程fatal与non fatal模式切换:等您坐沙发呢!