当前位置: 首页 > RAC > 正文

oprocd进程fatal与non fatal模式切换

下面来源与MOS文档:

Troubleshooting 10g and 11.1 Clusterware Reboots (Doc ID 265769.1)

下面是sun的案例

Edit the init.cssd file from the location in step 1, change the OPROCD  startup line to a non-fatal startup:

 

Sun Example:

 

# in fatal mode we always will start OPROCD FATAL

if [ $OPROCD_EXISTS ]; then

$OPROCD start -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN

$OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL

fi

 

Change this to:

 

# in fatal mode we always will start OPROCD FATAL

if [ $OPROCD_EXISTS ]; then

$OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN

$OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL

fi

 

You could also combine this method with the ‘tracing system calls’ method for more debugging.

下面是在我的linux上面测试

[root@cisser1 oprocd]# oprocd help

 

usage:  oprocd [start | startInstall | stop | check | enableFatal| help | -?]

 

        run [ -t | -m | -g | -f  | -e]   foreground startup

              -t <timeout>          timeout in ms

              -m <margin>           timout margin in ms

              -e <epsilon>          clock skew epsilon in ms

              -g <groupName>        group name to enable fatal

              -f                    fatal startup

 

        start  [-t | -m  | -e]           starts the daemon

                -t <timeout>        timeout in ms

                -m <margin>         timout margin in ms

                 -e <epsilon>        clock skew epsilon in ms

 

        startInstall [ -t | -m | -g  | – e] start process in install mode

                       -t <timeout>   timeout in ms

                       -m <margin>    timout margin in ms

                       -e <epsilon>   clock skew epsilon in ms

                       -g <groupName> group name to enable fatal

 

        enableFatal  [ -t ]             force install mode process to fatal

                       -t <timeout>   timeout for response in ms

        stop         [ -t ]             stops running daemon

                       -t <timeout>   timeout for response in ms

        check        [ -t ]           checks status of daemon

                       -t <timeout>   timeout for response in ms

        help                          this help information

        -?                            same as help above

下面看看init.cssd日志

使用startInstall模式启动

 

    # Backup the oprocd last gasp files

    if [ -f $OPROCDLGL ] ; then

      FILENAME=$OPROCDLGL.$UNIQUEDATE

      $MVF $OPROCDLGL "$FILENAME"

    fi

 

    # Run oprocd synchronously and look for its status code

    cd $OPROCDIR

 

    # startup the some diagnostic collection scripts if any

    StartDiagCollect;

 

    $OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \

       $OPROCD_DEFAULT_HISTOGRAM $FATALARG

    RC=$?

 

    # shutdown diagnostic collection

    StopDiagCollect;

看看日志显示

[root@cisser1 oprocd]# /etc/init.d/init.crs stop

Shutting down Oracle Cluster Ready Services (CRS):

Mar 27 09:33:33.060 | ERR | failed to connect to daemon, errno(111)

Stopping resources. This could take several minutes.

Error while stopping resources. Possible cause: CRSD is down.

Shutdown has begun. The daemons should exit soon.

[root@cisser1 oprocd]# /etc/init.d/init.crs start

Startup will be queued to init within 30 seconds.

 

没有生产任何的日志。

 

[root@cisser1 oprocd]# ps -ef|grep init

root         1     0  0 09:36 ?        00:00:01 init [5]                                             

root      3242     1  0 09:37 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run

root      3243     1  0 09:37 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal

root      3252     1  0 09:37 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run

root      4083  3243  0 09:37 ?        00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon

root      4136  3243  0 09:37 ?        00:00:00 /bin/sh /etc/init.d/init.cssd daemon

root     22547 21136  0 09:39 pts/0    00:00:00 grep init

[root@cisser1 oprocd]# ps -ef|grep oproc

root     23775 21136  0 09:39 pts/0    00:00:00 grep oproc

这里没有看到oprocd进程,也没有看到生成任何的日志。

 

取消-f选项

 

  # startup the some diagnostic collection scripts if any

    StartDiagCollect;

 

    $OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \

       $OPROCD_DEFAULT_HISTOGRAM

       #$OPROCD_DEFAULT_HISTOGRAM $FATALARG

    RC=$?

这里直接重启的主机

[root@cisser1 oprocd]# ls -lrt

total 84

-rwxr–r– 1 root root  512 Mar 24 10:00 cisser1.oprocd.lgl.2015-03-25-00:15:29

-rw-r–r– 1 root root  770 Mar 24 15:57 cisser1.oprocd.log.2015-03-25-00:15:29

-rw-r–r– 1 root root  175 Mar 25 00:15 cisser1.oprocd.log.2015-03-26-11:35:24

-rwxr–r– 1 root root  512 Mar 25 00:15 cisser1.oprocd.lgl.2015-03-26-11:35:24

-rw-r–r– 1 root root  175 Mar 26 11:35 cisser1.oprocd.log.2015-03-26-19:05:17

-rwxr–r– 1 root root  512 Mar 26 11:35 cisser1.oprocd.lgl.2015-03-26-19:05:17

-rwxr–r– 1 root root  512 Mar 26 19:05 cisser1.oprocd.lgl.2015-03-26-19:16:22

-rw-r–r– 1 root root  304 Mar 26 19:15 cisser1.oprocd.log.2015-03-26-19:16:22

-rw-r–r– 1 root root   97 Mar 26 19:16 cisser1.oprocd.log.2015-03-26-19:24:47

-rwxr–r– 1 root root  512 Mar 26 19:16 cisser1.oprocd.lgl.2015-03-26-19:24:47

-rwxr–r– 1 root root  512 Mar 26 19:24 cisser1.oprocd.lgl.2015-03-26-19:30:17

-rw-r–r– 1 root root  226 Mar 26 19:29 cisser1.oprocd.log.2015-03-26-19:30:17

-rwxr–r– 1 root root  512 Mar 26 19:30 cisser1.oprocd.lgl.2015-03-26-19:31:53

-rw-r–r– 1 root root  304 Mar 26 19:31 cisser1.oprocd.log.2015-03-26-19:31:53

-rwxr–r– 1 root root  512 Mar 27 14:08 cisser1.oprocd.lgl.2015-03-27-14:11:11

-rw-r–r– 1 root root  164 Mar 27 14:09 cisser1.oprocd.log.2015-03-27-14:11:11

drwxrwx— 2 root root 4096 Mar 27 14:11 stop

drwxrwx— 2 root root 4096 Mar 27 14:11 fatal

-rw-r–r– 1 root root   97 Mar 27 14:11 cisser1.oprocd.log

-rwxr–r– 1 root root  512 Mar 27 14:11 cisser1.oprocd.lgl

drwxrwx— 2 root root 4096 Mar 27 14:11 check

[root@cisser1 oprocd]# cat cisser1.oprocd.log

Mar 27 14:11:11.602 | INF | monitoring started with timeout(1000), margin(500), skewTimeout(125)

 

[root@cisser1 oprocd]# ps -ef|grep oprocd

root      4451     1  0 14:11 ?        00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin startInstall -t 1000 -m 500

root      5634  5504  0 14:14 pts/0    00:00:00 grep oprocd

[root@cisser1 oprocd]# ps -ef|grep init

root         1     0  0 14:10 ?        00:00:01 init [5]                                             

root      3244     1  0 14:11 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run

root      3245     1  0 14:11 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal

root      3257     1  0 14:11 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run

root      4027  3245  0 14:11 ?        00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon

root      4119  3245  0 14:11 ?        00:00:00 /bin/sh /etc/init.d/init.cssd daemon

这里看到生成了日志文件,并且日志文件里面也有内容,但是日志里面没有fatalnoe fatal的标示符。

 

 

下面是默认情况

    # startup the some diagnostic collection scripts if any

    StartDiagCollect;

 

    $OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \

       $OPROCD_DEFAULT_HISTOGRAM $FATALARG

    RC=$?

重启主机

[root@cisser1 oprocd]# ps -ef|grep oprocd

root      4069  3242  0 14:44 ?        00:00:00 /bin/sh /etc/init.d/init.cssd oprocd

root      4445  4069  0 14:44 ?        00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin run -t 1000 -m 500 -f

root      5234  5109  0 14:45 pts/0    00:00:00 grep oprocd

[root@cisser1 oprocd]# ls -lrt

total 92

-rwxr–r– 1 root root  512 Mar 24 10:00 cisser1.oprocd.lgl.2015-03-25-00:15:29

-rw-r–r– 1 root root  770 Mar 24 15:57 cisser1.oprocd.log.2015-03-25-00:15:29

-rw-r–r– 1 root root  175 Mar 25 00:15 cisser1.oprocd.log.2015-03-26-11:35:24

-rwxr–r– 1 root root  512 Mar 25 00:15 cisser1.oprocd.lgl.2015-03-26-11:35:24

-rw-r–r– 1 root root  175 Mar 26 11:35 cisser1.oprocd.log.2015-03-26-19:05:17

-rwxr–r– 1 root root  512 Mar 26 11:35 cisser1.oprocd.lgl.2015-03-26-19:05:17

-rwxr–r– 1 root root  512 Mar 26 19:05 cisser1.oprocd.lgl.2015-03-26-19:16:22

-rw-r–r– 1 root root  304 Mar 26 19:15 cisser1.oprocd.log.2015-03-26-19:16:22

-rw-r–r– 1 root root   97 Mar 26 19:16 cisser1.oprocd.log.2015-03-26-19:24:47

-rwxr–r– 1 root root  512 Mar 26 19:16 cisser1.oprocd.lgl.2015-03-26-19:24:47

-rwxr–r– 1 root root  512 Mar 26 19:24 cisser1.oprocd.lgl.2015-03-26-19:30:17

-rw-r–r– 1 root root  226 Mar 26 19:29 cisser1.oprocd.log.2015-03-26-19:30:17

-rwxr–r– 1 root root  512 Mar 26 19:30 cisser1.oprocd.lgl.2015-03-26-19:31:53

-rw-r–r– 1 root root  304 Mar 26 19:31 cisser1.oprocd.log.2015-03-26-19:31:53

-rwxr–r– 1 root root  512 Mar 27 14:08 cisser1.oprocd.lgl.2015-03-27-14:11:11

-rw-r–r– 1 root root  164 Mar 27 14:09 cisser1.oprocd.log.2015-03-27-14:11:11

-rwxr–r– 1 root root  512 Mar 27 14:11 cisser1.oprocd.lgl.2015-03-27-14:44:46

-rw-r–r– 1 root root  164 Mar 27 14:43 cisser1.oprocd.log.2015-03-27-14:44:46

drwxrwx— 2 root root 4096 Mar 27 14:44 stop

drwxrwx— 2 root root 4096 Mar 27 14:44 fatal

-rw-r–r– 1 root root  175 Mar 27 14:44 cisser1.oprocd.log

-rwxr–r– 1 root root  512 Mar 27 14:44 cisser1.oprocd.lgl

drwxrwx— 2 root root 4096 Mar 27 14:44 check

[root@cisser1 oprocd]# cat cisser1.oprocd.log

Mar 27 14:44:46.836 | INF | monitoring started with timeout(1000), margin(500), skewTimeout(125)

Mar 27 14:44:46.837 | INF | fatal mode startup, setting process to fatal mode

这里明确的给出了是在fatal模式。

本文固定链接: http://www.htz.pw/2015/03/26/oprocd%e8%bf%9b%e7%a8%8bfatal%e4%b8%8enon-fatal%e6%a8%a1%e5%bc%8f%e5%88%87%e6%8d%a2.html | 认真就输

该日志由 huangtingzhong 于2015年03月26日发表在 RAC 分类下, 通告目前不可用,你可以至底部留下评论。
原创文章转载请注明: oprocd进程fatal与non fatal模式切换 | 认真就输
关键字: