当前位置: 首页 > RAC > 正文

SOLARIS RAC平台模拟节点crash后强制删除与增加

本次测试来至于跟朋友一次聊天,关于10G RACcrash节点的删除与重新增加,已经N久没有做过10G RAC的操作,并且原来的操作记录也没有找到,悲剧的曾经的笔记全掉了。

在这次测试过程中,遇到一个原来重要没有遇到的过的问题。

 

欢迎大家加入ORACLE超级群:17115662 免费解决各种ORACLE问题

 

本次是通过手动删除节点1来模拟节点1crash后,在节点2上清除节点1的信息。官方文档见:

Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up (文档 ID 466975.1)

 

1,当前集群资源的状态

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….SM1.asm application    ONLINE    ONLINE    sol1       

ora….L1.lsnr application    ONLINE    ONLINE    sol1       

ora.sol1.gsd   application    ONLINE    ONLINE    sol1       

ora.sol1.ons   application    ONLINE    ONLINE    sol1       

ora.sol1.vip   application    ONLINE    ONLINE    sol1       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g1.inst application    ONLINE    ONLINE    sol1       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

2,删除节点1中数据库与crs的信息

www.htz.pw # /oracle/app/oracle/product/10.2.0/db_1/bin/crsctl stop crs

Stopping resources. This could take several minutes.

Successfully stopped CRS resources.

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

 

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….SM1.asm application    ONLINE    OFFLINE              

ora….L1.lsnr application    ONLINE    OFFLINE              

ora.sol1.gsd   application    ONLINE    OFFLINE              

ora.sol1.ons   application    ONLINE    OFFLINE              

ora.sol1.vip   application    ONLINE    ONLINE    sol2       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g1.inst application    ONLINE    OFFLINE              

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

 

删除节点1相当的信息

www.htz.pw # rm /etc/init.d/init.cssd

rm /etc/init.d/init.crs

www.htz.pw # rm /etc/init.d/init.crs

www.htz.pw # rm /etc/init.d/init.crsd

www.htz.pw # rm /etc/init.d/init.evmd

www.htz.pw # rm /etc/rc3.d/K96init.crs

/etc/rc3.d/K96init.crs: No such file or directory

www.htz.pw # rm /etc/rc3.d/S96init.crs

www.htz.pw # rm -Rf /var/opt/oracle/scls_scr

www.htz.pw # rm -Rf /var/opt/oracle/oprocd

www.htz.pw # rm /etc/inittab.crs

www.htz.pw # cp /etc/inittab.orig /etc/inittab

 

www.htz.pw #  rm -rf /var/tmp/.oracle/*

www.htz.pw #

www.htz.pw #  rm -rf /tmp/.oracle/*

www.htz.pw # rm -rf /oracle/app

3,节点2CRS中削除节点1的信息

www.htz.pw # pwd

/oracle/app/oracle/product/10.2.0/crs_1/bin

www.htz.pw # ./oifcfg getif

e1000g0  192.168.111.0  global  public

e1000g1  192.168.112.0  global  cluster_interconnect

 

oifcfg中清除节点1的信息

www.htz.pw # ./oifcfg delif -node sol1

PROC-4: The cluster registry key to be operated on does not exist.

PRIF-11: cluster registry error

 

ons中清除节点1的信息

www.htz.pw # cat  $CRS_HOME/opmn/conf/ons.config

localport=6100

remoteport=6200

loglevel=3

useocr=on

 

 

www.htz.pw # $CRS_HOME/bin/racgons remove_config sol1:6200

racgons: Existing key value on sol1 = 6200.

racgons: sol1:6200 removed from OCR.

 

 

www.htz.pw # $CRS_HOME/bin/crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….SM1.asm application    ONLINE    OFFLINE              

ora….L1.lsnr application    ONLINE    OFFLINE              

ora.sol1.gsd   application    ONLINE    OFFLINE              

ora.sol1.ons   application    ONLINE    OFFLINE              

ora.sol1.vip   application    ONLINE    ONLINE    sol2       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g1.inst application    ONLINE    OFFLINE              

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

 

CRS中清除节点1的信息

www.htz.pw # $CRS_HOME/bin/srvctl remove instance -d sol10g -i sol10g1

Remove instance sol10g1 from the database sol10g? (y/[n]) y

 

www.htz.pw # $CRS_HOME/bin/srvctl remove asm  -n sol1

 

 

www.htz.pw # $CRS_HOME/bin/srvctl remove  nodeapps -n sol1

Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y

PRKO-2108 : Node applications are still running on node: sol1

 

www.htz.pw # $CRS_HOME/bin/srvctl remove  nodeapps -n sol1

Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y

PRKO-2108 : Node applications are still running on node: sol1

# $CRS_HOME/bin/crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….L1.lsnr application    ONLINE    OFFLINE              

ora.sol1.gsd   application    ONLINE    OFFLINE              

ora.sol1.ons   application    ONLINE    OFFLINE              

ora.sol1.vip   application    ONLINE    ONLINE    sol2       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2       

www.htz.pw # $CRS_HOME/bin/crs_stop -f ora.sol1.vip

Target set to OFFLINE for `ora.sol1.LISTENER_SOL1.lsnr`

Attempting to stop `ora.sol1.vip` on member `sol2`

Stop of `ora.sol1.vip` on member `sol2` succeeded.

www.htz.pw # $CRS_HOME/bin/srvctl remove  nodeapps -n sol1

Please confirm that you intend to remove the node-level applications on node sol1 (y/[n]) y

PRKO-2112 : Some or all node applications are not removed successfully on node: sol1

# $CRS_HOME/bin/crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….L1.lsnr application    OFFLINE   OFFLINE              

ora.sol1.vip   application    OFFLINE   OFFLINE              

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

 

www.htz.pw # $CRS_HOME/bin/crs_stat|grep lsn

NAME=ora.sol1.LISTENER_SOL1.lsnr

NAME=ora.sol2.LISTENER_SOL2.lsnr

www.htz.pw # $CRS_HOME/bin/crs_unregister  ora.sol1.LISTENER_SOL1.lsnr

 

www.htz.pw # $CRS_HOME/bin/olsnodes -n

sol1    1

sol2    2

 

 

删除节点1的信息

www.htz.pw # ./rootdeletenode.sh sol1,1

CRS-0210: Could not find resource ‘ora.sol1.LISTENER_SOL1.lsnr’.

CRS-0210: Could not find resource ‘ora.sol1.ons’.

CRS-0210: Could not find resource ‘ora.sol1.vip’.

CRS-0210: Could not find resource ‘ora.sol1.gsd’.

CRS-0210: Could not find resource ora.sol1.vip.

CRS nodeapps are deleted successfully

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Successfully deleted 14 values from OCR.

Key SYSTEM.css.interfaces.nodesol1 marked for deletion is not there. Ignoring.

Successfully deleted 5 keys from OCR.

Node deletion operation successful.

‘sol1,1’ deleted successfully

www.htz.pw # $CRS_HOME/bin/olsnodes -n

sol2    2

 

更新inventory文件

www.htz.pw $ $ORA_CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORA_CRS_HOME “CLUSTER_NODES=sol2” CRS=TRUE

Starting Oracle Universal Installer…

 

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /var/opt/oracle/oraInst.loc

The inventory is located at /oracle/app/oracle/oraInventory

‘UpdateNodeList’ was successful.

 

 

www.htz.pw $ $ORACLE_HOME//oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME “CLUSTER_NODES=sol2”

Starting Oracle Universal Installer…

 

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /var/opt/oracle/oraInst.loc

The inventory is located at /oracle/app/oracle/oraInventory

‘UpdateNodeList’ was successful.

 

更新后的值

 

www.htz.pw $ cat inventory.xml

<?xml version=”1.0″ standalone=”yes” ?>

<!– Copyright (c) 2008 Oracle Corporation. All rights Reserved –>

<!– Do not modify the contents of this file by hand. –>

<INVENTORY>

<VERSION_INFO>

   <SAVED_WITH>10.2.0.4.0</SAVED_WITH>

   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>

</VERSION_INFO>

<HOME_LIST>

<HOME NAME=”OraCrs10g_home” LOC=”/oracle/app/oracle/product/10.2.0/crs_1″ TYPE=”O” IDX=”1″ CRS=”true”>

   <NODE_LIST>

      <NODE NAME=”sol2″/>

   </NODE_LIST>

</HOME>

<HOME NAME=”OraDb10g_home1″ LOC=”/oracle/app/oracle/product/10.2.0/db_1″ TYPE=”O” IDX=”2″>

   <NODE_LIST>

      <NODE NAME=”sol2″/>

   </NODE_LIST>

</HOME>

</HOME_LIST>

</INVENTORY>

 

4,增加节点

www.htz.pw $ pwd

/oracle/app/oracle/product/10.2.0/crs_1/oui/bin

www.htz.pw $ ls

addLangs.sh      addNode.sh       attachHome.sh    detachHome.sh    lsnodes          ouica.bat        ouica.sh         resource         runConfig.sh     runInstaller     runInstaller.sh

www.htz.pw $ ./addNode.sh

Starting Oracle Universal Installer…

 

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

Oracle Universal Installer, Version 10.2.0.4.0 Production

Copyright (C) 1999, 2008, Oracle. All rights reserved

 

这里点next就报错,从报错信息中可以找到下面内容,oui的默认日志路径见log文件位置

INFO: Username:oracle

 

INFO: Install area Control created with access level  1

 

INFO: Oracle Universal Installer version is 10.2.0.4.0

 

INFO: Setting variable ‘ORACLE_HOME’ to ‘/oracle/app/oracle/product/10.2.0/crs_1’. Received the value from the command line.

INFO: Setting variable ‘PREREQ_CONFIG_LOCATION’ to ”. Received the value from variable association.

INFO: Setting variable ‘FROM_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml’. Received the value from a code block.

INFO: Setting variable ‘ROOTSH_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/root.sh’. Received the value from a code block.

INFO: Setting variable ‘ROOTSH_STATUS’ to ‘3’. Received the value from a code block.

INFO: Setting variable ‘ORACLE_HOME’ to ‘/oracle/app/oracle/product/10.2.0/crs_1’. Received the value from the command line.

INFO: Setting variable ‘PREREQ_CONFIG_LOCATION’ to ”. Received the value from variable association.

INFO: Setting variable ‘FROM_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml’. Received the value from a code block.

INFO: Setting variable ‘ROOTSH_LOCATION’ to ‘/oracle/app/oracle/product/10.2.0/crs_1/root.sh’. Received the value from a code block.

INFO: Setting variable ‘ROOTSH_STATUS’ to ‘3’. Received the value from a code block.

INFO:

*** Welcome Page***

INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.

INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.

 

下面部分是在点next之后产生的

/************************************************************************************

INFO: Setting variable ‘ORACLE_HOME_NAME’ to ‘OraCrs10g_home’. Received the value from a code block.

INFO: Unable to read inventory information for the home: /oracle/app/oracle/product/10.2.0/crs_1.

SEVERE: Abnormal program termination. An internal error has occured. Please provide the following files to Oracle Support :

 

“/oracle/app/oracle/oraInventory/logs/addNodeActions2014-05-08_06-05-16PM.log”

“/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.err”

“/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.out

***************************************************************************************/

 

/oracle/app/oracle/oraInventory/logs/oraInstall2014-05-08_06-05-16PM.err中我们可以发现下面的报错信息

 

org.xml.sax.SAXParseException: <Line 889, Column 9>: XML-20210: (Fatal Error) Unexpected EOF.

        at oracle.xml.parser.v2.XMLError.flushErrorHandler(XMLError.java:415)

        at oracle.xml.parser.v2.XMLError.flushErrors1(XMLError.java:284)

        at oracle.xml.parser.v2.XMLReader.popXMLReader(XMLReader.java:540)

        at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1339)

        at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:326)

        at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:293)

        at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:209)

        at oracle.sysman.oii.oiii.OiiiInstallXMLReader.readComps(OiiiInstallXMLReader.java:271)

        at oracle.sysman.oii.oiii.OiiiInstallInventory.getCompOHListElement(OiiiInstallInventory.java:1663)

        at oracle.sysman.oii.oiii.OiiiAreaInventory.getAllCompsVect(OiiiAreaInventory.java:1052)

        at oracle.sysman.oii.oiii.OiiiAreaInventory.getTopLevelComps(OiiiAreaInventory.java:1872)

        at oracle.sysman.oii.oiii.OiiiInstallInventory.setOHProperties(OiiiInstallInventory.java:6064)

        at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.addHomes(OiifpContentsTabPanel.java:777)

        at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.fillInventoryTree(OiifpContentsTabPanel.java:691)

        at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.refreshTree(OiifpContentsTabPanel.java:1508)

        at oracle.sysman.oii.oiif.oiifp.OiifpContentsTabPanel.prepareInvTree(OiifpContentsTabPanel.java:2253)

        at oracle.sysman.oii.oiif.oiifd.OiifdInventoryDialog.doModal(OiifdInventoryDialog.java:457)

        at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog.onViewPrivate(OiifwWizDialog.java:863)

        at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog.access$000(OiifwWizDialog.java:330)

        at oracle.sysman.oii.oiif.oiifw.OiifwWizDialog$PrepareInventoryTree.run(OiifwWizDialog.java:1778)

        at java.lang.Thread.run(Thread.java:534)

这里报XML结果,非法的文件结局,中途以为是inventory配置错误,使用opatch -lsinventory结果显示正常,这里怀疑是某个XML文件损坏导致的。

 

通过truss来查看runInstaller访问了那些xml文件

 

www.htz.pw $ truss -aefo /tmp/123.log ./addNode.sh

Starting Oracle Universal Installer…

 

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

Oracle Universal Installer, Version 10.2.0.4.0 Production

Copyright (C) 1999, 2008, Oracle. All rights reserved.

 

 

可以看到打开了如下的XML文件

 115  1728 23646/1:        open(“/oracle/app/oracle/product/10.2.0/crs_1/oui/jlib/xmlparserv2.jar”, O_RDONLY|O_LARGEFILE) = 6

 119  1776 23646/1:        open(“/oracle/app/oracle/product/10.2.0/crs_1/oui/jlib/xml.jar”, O_RDONLY|O_LARGEFILE) = 6

1382 15600 23646/1:        open(“/oracle/app/oracle/oraInventory/ContentsXML/inventory.xml”, O_RDONLY|O_LARGEFILE) = 15

1385 15915 23646/1:        open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/oraclehomeproperties.xml”, O_RDONLY|O_LARGEFILE) = 15

1387 15974 23646/1:        open(“/oracle/app/oracle/product/10.2.0/db_1/inventory/ContentsXML/oraclehomeproperties.xml”, O_RDONLY|O_LARGEFILE) = 15

1401 18121 23646/15:       open(“/oracle/app/oracle/oraInventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 19

1403 18217 23646/15:       open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 19

1421 19159 23646/15:       open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 20

1444 22099 23646/1:        open(“/oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml”, O_RDONLY|O_LARGEFILE) = 21

 

最后发现只有comps.xml有889行

 

这里下载了一个XML编辑器,不能正常编辑COMPS.XML文件,说明文件有异常。

从其它的环境是CP一个COMPS.XML过来一对比,发现出错的XML文件下了很多的内容

 

正常的comps.xml文件的结构如下:

www.htz.pw $ cat comps.xml

<?xml version=”1.0″ standalone=”yes” ?>

<!– Copyright (c) 2008 Oracle Corporation. All rights Reserved –>

<!– Do not modify the contents of this file by hand. –>

<PRD_LIST>

<TL_LIST>

</TL_LIST>

<COMP_LIST>

</COMP_LIST>

<ONEOFF_LIST>

</ONEOFF_LIST>

</PRD_LIST>

www.htz.pw $ pwd

/oracle/app/oracle/oraInventory/ContentsXML

 

其实这里我们也可以通过opatch来验证XML文件结构是否正确

 

www.htz.pw $ $ORA_CRS_HOME/OPatch/opatch  util LoadXML -xmlInput /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml.back

Invoking OPatch 10.2.0.4.2

 

Oracle Interim Patch Installer version 10.2.0.4.2

Copyright (c) 2007, Oracle Corporation.  All rights reserved.

 

UTIL session

 

Oracle Home       : /oracle/app/oracle/product/10.2.0/db_1

Central Inventory : /oracle/app/oracle/oraInventory

   from           : /var/opt/oracle/oraInst.loc

OPatch version    : 10.2.0.4.2

OUI version       : 10.2.0.4.0

OUI location      : /oracle/app/oracle/product/10.2.0/db_1/oui

Log file location : /oracle/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch2014-05-08_20-39-18PM.log

 

Invoking utility “loadxml”

UtilSession failed: Unable to parse the xml file.

 

OPatch failed with error code 73

 

由于这里是测试环境,所以我直接使用的mv方式,如果是生产环境,建议从其它相当环境CP一个文件过来

 

www.htz.pw # mv  /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml  /oracle/app/oracle/product/10.2.0/crs_1/inventory/ContentsXML/comps.xml.back

 

再次执行addNode.sh终于见到了

clip_image001

一路next下去,一切正常,直到出现下面的图片

clip_image002

这里我们选择yes,因为OCR是由root用户执行的,日志属主是root,不影响addNode.sh操作

 

在执行addNode.sh操作的主机上面执行rootaddnode.sh报下面的错误

 

www.htz.pw # /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Attempting to add 1 new nodes to the configuration

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 3: sol1 sol1-priv sol1

Creating OCR keys for user ‘root’, privgrp ‘root’..

Operation successful.

/oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl add nodeapps -n sol1 -A %s_nodevips%/255.255.255.0/e1000g0 -o /oracle/app/oracle/product/10.2.0/crs_1

PRKO-2109 : Invalid address string: %s_nodevips%/255.255.255.0/e1000g0

 

www.htz.pw # /oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl config nodeapps -n  sol2 -a

VIP exists.: /sol2-vip/192.168.111.49/255.255.255.0/e1000g0

 

 

www.htz.pw # grep “s_nodevips|CRS_NEW_NODEVIPS” /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh

CRS_NEW_NODEVIPS=%s_nodevips%

 

 

www.htz.pw # grep   “CRS_NEW_NODEVIPS” /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh 

CRS_NEW_NODEVIPS=%s_nodevips%

  NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`

 

手动修改rootaddnode.sh脚本内容,

/oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh

 

Ni=1

for i in `$ECHO $NODES_LIST`

do

  NODE_NAME=$i

  NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`

  NODEVIP=$NODE_VIP/$NETMASK/$NETIFs

 

  $ECHO $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH

  $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH

  Ni=`expr $Ni + 1`

done

 

 

更改后

Ni=1

for i in `$ECHO $NODES_LIST`

do

  NODE_NAME=$i

  #NODE_VIP=`$ECHO $CRS_NEW_NODEVIPS | $CUT -d’,’ -f$Ni`

  NODE_VIP=192.168.111.48

  NODEVIP=$NODE_VIP/$NETMASK/$NETIFs

 

  $ECHO $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH

  $CH/bin/srvctl add nodeapps -n $NODE_NAME -A $NODEVIP -o $CH

  Ni=`expr $Ni + 1`

done

 

 

www.htz.pw # /oracle/app/oracle/product/10.2.0/crs_1/rootaddnode.sh

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Node sol1 is already assigned nodenum 3.

Aborting: No configuration data has been changed.

clscfg -add -nn nameA,numA,nameB,numB,… -pn privA,numA,privB,numB,…

       [-hn hostA,numA,hostB,numB,…] [-t p1,p2,p3,p4]

 -nn specifies nodenames in the same fashion as -nn in -install mode

 -pn specifies private interconnect names as -pn in -install mode

 -hn specifies hostnames in the same fashion as -hn in -install mode

 -t  specifies port numbers to be used by CRS daemons on the new node(s)

     default ports: 49895,49896,49897,49898

WARNING: Using this tool may corrupt your cluster configuration. Do not

         use unless you positively know what you are doing.

 

/oracle/app/oracle/product/10.2.0/crs_1/bin/srvctl add nodeapps -n sol1 -A 192.168.111.48/255.255.255.0/e1000g0 -o /oracle/app/oracle/product/10.2.0/crs_1

 

 

下面是在1节点执行

www.htz.pw # /oracle/app/oracle/product/10.2.0/crs_1/root.sh

WARNING: directory ‘/oracle/app/oracle/product/10.2.0’ is not owned by root

WARNING: directory ‘/oracle/app/oracle/product’ is not owned by root

WARNING: directory ‘/oracle/app/oracle’ is not owned by root

WARNING: directory ‘/oracle/app’ is not owned by root

WARNING: directory ‘/oracle’ is not owned by root

Checking to see if Oracle CRS stack is already configured

OCR LOCATIONS =  /dev/rdsk/c2t0d0s0

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

WARNING: directory ‘/oracle/app/oracle/product/10.2.0’ is not owned by root

WARNING: directory ‘/oracle/app/oracle/product’ is not owned by root

WARNING: directory ‘/oracle/app/oracle’ is not owned by root

WARNING: directory ‘/oracle/app’ is not owned by root

WARNING: directory ‘/oracle’ is not owned by root

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 1: sol1 sol1-priv sol1

node 2: sol2 sol2-priv sol2

clscfg: Arguments check out successfully.

 

NO KEYS WERE WRITTEN. Supply -force parameter to override.

-force is destructive and will destroy any previous cluster

configuration.

Oracle Cluster Registry for cluster has already been initialized

Startup will be queued to init within 30 seconds.

Adding daemons to inittab

Expecting the CRS daemons to be up within 600 seconds.

CSS is active on these nodes.

        sol2

        sol1

CSS is active on all nodes.

Waiting for the Oracle CRSD and EVMD to start

Oracle CRS stack installed and running under init(1M)

Running vipca(silent) for configuring nodeapps

 

Creating VIP application resource on (0) nodes.

Creating GSD application resource on (0) nodes.

Creating ONS application resource on (0) nodes.

Starting VIP application resource on (2) nodes1:CRS-1002: Resource ‘ora.sol1.vip’ is already running on member ‘sol1’

CRS-0223: Resource ‘ora.sol1.vip’ has placement error.

Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.vip.log” for more details

Starting GSD application resource on (2) nodes1:CRS-0233: Resource or relatives are currently involved with another operation.

Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.gsd.log” for more details

Starting ONS application resource on (2) nodes1:CRS-0233: Resource or relatives are currently involved with another operation.

Check the log file “/oracle/app/oracle/product/10.2.0/crs_1/log/sol1/racg/ora.sol1.ons.log” for more details

 

 

Done.

 

这里注意报了很多错误,但是不影响。

 

这里看到各个节点都正常

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora.sol1.gsd   application    ONLINE    ONLINE    sol1       

ora.sol1.ons   application    ONLINE    ONLINE    sol1       

ora.sol1.vip   application    ONLINE    ONLINE    sol1       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

www.htz.pw # ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

        inet 127.0.0.1 netmask ff000000

e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

        inet 192.168.111.46 netmask ffffff00 broadcast 192.168.111.255

        ether 0:c:29:5a:e5:7a

e1000g0:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2

        inet 192.168.111.48 netmask ffffff00 broadcast 192.168.111.255

e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3

        inet 192.168.112.46 netmask ffffff00 broadcast 192.168.112.255

        ether 0:c:29:5a:e5:84

 

 

下面就是在oracle用户下面增加节点

www.htz.pw $ ./addNode.sh

这里很顺利,无任何报错

 

配置监听服务,这里可以使用手动的方式来配置,由于是测试环境,我这里在正常的节点上面通过netca来配置的。在配置过程中监听服务需要被中断。

 

这里中途需要停listener

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….L1.lsnr application    ONLINE    ONLINE    sol1       

ora.sol1.gsd   application    ONLINE    ONLINE    sol1       

ora.sol1.ons   application    ONLINE    ONLINE    sol1       

ora.sol1.vip   application    ONLINE    ONLINE    sol1       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

在正常的节点上面执行dbca来增加实例

dbca很正常,会自动增加ASM实例的信息。

 

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….SM3.asm application    ONLINE    ONLINE    sol1       

ora….L1.lsnr application    ONLINE    ONLINE    sol1       

ora.sol1.gsd   application    ONLINE    ONLINE    sol1       

ora.sol1.ons   application    ONLINE    ONLINE    sol1       

ora.sol1.vip   application    ONLINE    ONLINE    sol1       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g1.inst application    ONLINE    ONLINE    sol1       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

 

www.htz.pw $ crs_stat|grep asm

NAME=ora.sol1.ASM3.asm

NAME=ora.sol2.ASM2.asm

 

这里注意到我们的ASM实例变成了ASM3,是由于自动增加的原因,我们以使用增加增加ASM实例来解决问题

 

www.htz.pw $ srvctl stop instance -d sol10g -i sol10g1

www.htz.pw $ srvctl stop asm -n sol1

 

www.htz.pw $ crs_unregister ora.sol10g.sol10g1.inst

www.htz.pw $ crs_unregister ora.sol1.ASM3.asm

 

 

www.htz.pw $ cat /oracle/app/oracle/admin/+ASM/pfile/init.ora

##############################################################################

www.htz.pw # Copyright (c) 1991, 2001, 2002 by Oracle Corporation

##############################################################################

 

###########################################

www.htz.pw # Cluster Database

###########################################

asm_diskgroups=’DATA’

background_dump_dest=/oracle/app/oracle/admin/+ASM/bdump

cluster_database=TRUE

core_dump_dest=/oracle/app/oracle/admin/+ASM/cdump

instance_type=asm

large_pool_size=12582912

remote_login_passwordfile=EXCLUSIVE

user_dump_dest=/oracle/app/oracle/admin/+ASM/udump

 

+ASM2.instance_number=2

+ASM1.instance_number=1

 

 

www.htz.pw $ export ORACLE_SID=+ASM1

www.htz.pw $ sqlplus / as sysdba

 

SQL*Plus: Release 10.2.0.4.0 – Production on Thu May 8 23:14:50 2014

 

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

 

Connected to an idle instance.

 

SQL> create spfile from pfile;

 

File created.

www.htz.pw $ srvctl remove asm -n sol1

www.htz.pw $ srvctl add asm -n sol1 -i +ASM1 -o $ORACLE_HOME -p $ORACLE_HOME/dbs/spfile+ASM1.ora

www.htz.pw $ srvctl start asm -n sol1

 

www.htz.pw $ srvctl start instance -d sol10g -i sol10g1

 

www.htz.pw $ srvctl start instance -d sol10g -i sol10g1

修改一下依赖性

www.htz.pw $ srvctl modify instance -d sol10g -i sol10g1 -s +ASM1

www.htz.pw $ srvctl stop asm -n sol1

www.htz.pw $ srvctl start instance -d sol10g -i sol10g1

 

下切正常

www.htz.pw $ crs_stat -t

Name           Type           Target    State     Host       

————————————————————

ora….SM1.asm application    ONLINE    ONLINE    sol1       

ora….L1.lsnr application    ONLINE    ONLINE    sol1       

ora.sol1.gsd   application    ONLINE    ONLINE    sol1       

ora.sol1.ons   application    ONLINE    ONLINE    sol1       

ora.sol1.vip   application    ONLINE    ONLINE    sol1       

ora.sol10g.db  application    ONLINE    ONLINE    sol2       

ora….g1.inst application    ONLINE    ONLINE    sol1       

ora….g2.inst application    ONLINE    ONLINE    sol2       

ora….SM2.asm application    ONLINE    ONLINE    sol2       

ora….L2.lsnr application    ONLINE    ONLINE    sol2       

ora.sol2.gsd   application    ONLINE    ONLINE    sol2       

ora.sol2.ons   application    ONLINE    ONLINE    sol2       

ora.sol2.vip   application    ONLINE    ONLINE    sol2

整个增加过程结束,在增加过程中,遇到了一点小麻烦。

本文固定链接: http://www.htz.pw/2014/05/11/solaris-rac%e5%b9%b3%e5%8f%b0%e6%a8%a1%e6%8b%9f%e8%8a%82%e7%82%b9crash%e8%8a%82%e7%82%b9%e7%9a%84%e5%bc%ba%e5%88%b6%e5%88%a0%e9%99%a4%e4%b8%8e%e5%a2%9e%e5%8a%a0.html | 认真就输

该日志由 huangtingzhong 于2014年05月11日发表在 RAC 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: SOLARIS RAC平台模拟节点crash后强制删除与增加 | 认真就输