下面的测试来至于一位网友,他们生产数据库异常,在drop表空间,重建控制文件后,报下面的错误:
Sat Jul 19 00:45:47 2014 SMON: enabling cache recovery Sat Jul 19 00:45:47 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc: ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 00:45:47 2014 Error 1173 happened during db open, shutting down database USER: terminating instance due to error 1173 Instance terminated by USER, pid = 12464 ORA-1092 signalled during: alter database open resetlogs… |
下面是简单的测试一下,提供2种方法来解决此故障。
欢迎大家加入ORACLE超级群:17115662 免费解决各种ORACLE问题,以后BLOG将迁移到http://www.htz.pw
1,数据库版本
www.htz.pw > select * from v$version where rownum<3;
BANNER —————————————————————- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi PL/SQL Release 10.2.0.4.0 – Production www.htz.pw > !lsb_release -a LSB Version: :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch Distributor ID: RedHatEnterpriseAS Description: Red Hat Enterprise Linux AS release 4 (Nahant Update 8) Release: 4 Codename: NahantUpdate8 |
2,查询undo段的名字
因为在实验过程中,我们需要使用到undo段的名字,所以这里提前查询出来,如果在生产环境,我们可以直接使用bbed去查询undo$表,或者是使用odu,dul等工具去直接抽取undo$表,另外了可以使用strings system数据文件来过滤UNDO段。
www.htz.pw > @undo_segment.sql
SEGMENT_HEADER TABLESPACE SEGMENT_NAME FILE#.BLOCK STATUS SEGMENT_SIZE(M) ——————– —————————— ————– ———- ————— SYSTEM.OLD PRI.SYSTEM 1.9 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU1$ 2.9 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU10$ 2.153 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU9$ 2.137 ONLINE 13 UNDOTBS1.CURRENT PUB._SYSSMU8$ 2.121 ONLINE 18 UNDOTBS1.CURRENT PUB._SYSSMU7$ 2.105 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU6$ 2.89 ONLINE 6 UNDOTBS1.CURRENT PUB._SYSSMU5$ 2.73 ONLINE 2 UNDOTBS1.CURRENT PUB._SYSSMU4$ 2.57 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU3$ 2.41 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU2$ 2.25 ONLINE 0 |
3,生成创建控制文件脚本
[oracle@www.htz.pw sql]$./create_controlfile_sql.sh please input direcotry default /tmp: please input file name default control.ctl: Database altered. |
这里生成的默认文件位置在/tmp/control.ctl
4,重建控制文件
www.htz.pw > shutdown abort; ORACLE instance shut down.
STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "ORCL1024" NORESETLOGS NOARCHIVELOG MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 100 MAXINSTANCES 8 MAXLOGHISTORY 292 LOGFILE GROUP 1 ‘/oracle/app/oracle/oradata/orcl1024/redo01.log’ SIZE 50M, GROUP 2 ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’ SIZE 50M, GROUP 3 ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’ SIZE 50M DATAFILE ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’, ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’,(需要删除这行) ‘/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf’, ‘/oracle/app/oracle/oradata/orcl1024/users01.dbf’ CHARACTER SET ZHS16GBK ; RECOVER DATABASE ALTER DATABASE OPEN; ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’ SIZE 1482M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M; |
5,故障现象出现
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72 _%u_.arc ORA-00280: change 2170641 for thread 1 is in sequence #72
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo02.log ORA-00310: archived log contains sequence 71; sequence 72 required ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72 _%u_.arc ORA-00280: change 2170641 for thread 1 is in sequence #72
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo03.log Log applied. Media recovery complete.
www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced |
alert中出现下面的报错
Sat Jul 19 00:45:47 2014 SMON: enabling cache recovery Sat Jul 19 00:45:47 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc: ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 00:45:47 2014 Error 1173 happened during db open, shutting down database USER: terminating instance due to error 1173 Instance terminated by USER, pid = 12464 ORA-1092 signalled during: alter database open resetlogs… |
6,故障处理方法1
在运气比较好的情况下使用此方案是可行的,朋友的数据库使用此方法,数据库能正常的OPEN。
6.1 修改参数文件
这里手动创建pfile文件,直接修改pfile文件比较简单,并且不影响原spfile文件,增加下面红色部分参数
www.htz.pw > !vi /tmp/123.ora orcl1024.__db_cache_size=54525952 orcl1024.__java_pool_size=4194304 orcl1024.__large_pool_size=8388608 orcl1024.__shared_pool_size=88080384 orcl1024.__streams_pool_size=0 *._backup_ksfq_bufsz=1048576 *._log_parallelism=2 *._log_parallelism_max=4 *._pga_max_size=5368709120 *._smm_max_size=3145728 *.audit_file_dest=’/oracle/app/oracle/admin/orcl1024/adump’ *.background_dump_dest=’/oracle/app/oracle/admin/orcl1024/bdump’ *.compatible=’10.2.0.3.0′ *.control_files=’/oracle/app/oracle/oradata/orcl1024/control01.ctl’,’/oracle/app/oracle/oradata/orcl1024/control02.ctl’,’/oracle/app/oracle/oradata/orcl1024/control03.ctl’ *.core_dump_dest=’/oracle/app/oracle/admin/orcl1024/cdump’ *.cpu_count=3 *.db_block_size=8192 *.db_domain=” *.db_file_multiblock_read_count=16 *.db_name=’orcl1024′ *.db_recovery_file_dest=’/oracle/app/oracle/flash_recovery_area’ *.db_recovery_file_dest_size=4294967296 *.dbwr_io_slaves=4 *.disk_asynch_io=FALSE *.dispatchers='(PROTOCOL=TCP) (SERVICE=orcl1024XDB)’ *.event=” *.job_queue_processes=10 *.open_cursors=300 *.pga_aggregate_target=1073741824 *.processes=150 *.recyclebin=’OFF’ *.remote_login_passwordfile=’EXCLUSIVE’ *.sga_target=167772160 #*.undo_management=’AUTO’ *.undo_management=’manual’ *.undo_tablespace=’UNDOTBS1′ *.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’ _corrupted_rollback_segments=(_SYSSMU1$,_SYSSMU2$,_SYSSMU3$,_SYSSMU4$,_SYSSMU5$,_SYSSMU6$,_SYSSMU7$,_SYSSMU8$,_SYSSMU9$,_SYSSMU10$) 这里通常还需要增加下面的2个参数 _allow_resetlogs_corruption=true _allow_error_simulation=true 另外还可以会增加一个event,如果smon一些功能的event。 |
6.2 启动数据库
www.htz.pw > startup mount pfile=’/tmp/123.ora’; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171386 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo03.log ORA-00339: archived log does not contain any redo ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171386 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo01.log Log applied. Media recovery complete. www.htz.pw > alter database open resetlogs;
Database altered. 这里看到数据库已经正常打开,这里还需要注意观察,alert日志文件是否有异常报错。 |
6.3 重建undo表空间
www.htz.pw > !rm /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf
www.htz.pw > create undo tablespace undotbs1 datafile ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ size 10m;
Tablespace created. |
6.4 使用源参数文件启动数据库
www.htz.pw > shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. www.htz.pw > startup ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. Database opened. |
数据库启动正常,注意观察alert日志中是否有报错。
7 故障处理方法2
使用此方法,要求原来的UNDO数据文件存在,此方法就是将原来的undo数据文件再次增加到控制文件中去,此方法比较复制,因为在开启数据库的都会遇到其它很多的一些问题。
7.1 故障现象重现
www.htz.pw > select open_mode from v$database;
OPEN_MODE ———- READ WRITE
数据库的状态是正常的 www.htz.pw > select name from v$dbfile;
NAME ——————————————————————————– /oracle/app/oracle/oradata/orcl1024/users01.dbf /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf /oracle/app/oracle/oradata/orcl1024/system01.dbf /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 存在的数据文件 www.htz.pw > shutdown abort; ORACLE instance shut down. 重建控制文件,控制文件中不包括undo表空间 www.htz.pw > @/tmp/control.ctl ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes
Control file created.
ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
ALTER DATABASE OPEN * ERROR at line 1: ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’ * ERROR at line 1: ORA-01109: database not open
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171941 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
ORA-01112: media recovery not started
*.remote_login_passwordfile=’EXCLUSIVE’ *.sga_target=167772160 #*.undo_management=’AUTO’ *.undo_management=’manual’ *.undo_tablespace=’UNDOTBS1′ *.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’ _allow_resetlogs_corruption=true _allow_error_simulation=true 增加上面的参数文件
www.htz.pw > startup force mount pfile=’/tmp/123.ora’; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171941 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/app/oracle/oradata/orcl1024/redo01.log Log applied. Media recovery complete.
www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced
重现故障现在 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_14960.trc: ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 2 ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 01:01:40 2014 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 14960 ORA-1092 signalled during: alter database open resetlogs… |
7.2 重建控制文件
重建控制文件,控制文件中包括undo表空间的数据文件
[oracle@www.htz.pw sql]$sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sat Jul 19 01:09:05 2014
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
www.htz.pw > startup nomount pfile=’/tmp/123.ora’; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes www.htz.pw > @/tmp/control.ctl ORA-01081: cannot start already-running ORACLE – shut it down first CREATE CONTROLFILE REUSE DATABASE "ORCL1024" RESETLOGS NOARCHIVELOG * ERROR at line 1: ORA-01503: CREATE CONTROLFILE failed ORA-01189: file is from a different RESETLOGS than previous files ORA-01110: data file 2: ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ |
这里提示ORA-01189的错误。
1189的错误很简单,因为数据文件头的resetlogs信息不一致导致的。
7.3 bbed修改数据文件头中RESETLOG与SCN信息
www.htz.pw > !cat /tmp/bbed.par listfile=/tmp/bbed.datafile
www.htz.pw > !cat /tmp/bbed.datafile 1 /oracle/app/oracle/oradata/orcl1024/system01.dbf 2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf 4 /oracle/app/oracle/oradata/orcl1024/users01.dbf [oracle@www.htz.pw ~]$bbed parfile=/tmp/bbed.par Password:
BBED: Release 2.0.0.0.0 – Limited Production on Sat Jul 19 01:12:13 2014
Copyright (c) 1982, 2007, Oracle. All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> info File# Name Size(blks) —– —- ———- 1 /oracle/app/oracle/oradata/orcl1024/system01.dbf 0 2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 0 3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf 0 4 /oracle/app/oracle/oradata/orcl1024/users01.dbf 0 这里只需要修改上次resetlogs与SCN的值就可以了 BBED> assign file 2 block 1 offset 112 = file 1 block 1 offset 112; Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y ub4 kcvfhrlc @112 0x32dc2c73
BBED> assign file 2 block 1 offset 116 = file 1 block 1 offset 116; ub4 kscnbas @116 0x002125e0
BBED> assign file 2 block 1 offset 484 = file 1 block 1 offset 484; ub1 pad @484 0xe1
BBED> assign file 2 block 1 offset 492 = file 1 block 1 offset 492; ub1 pad @492 0x74 BBED> sum apply dba 2,1 Check value for File 2, Block 1: current = 0x093b, required = 0x093b
www.htz.pw > startup force nomount pfile=’/tmp/123.ora’; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes |
7.4 重建控制文件
www.htz.pw > @/tmp/control.ctl ORA-01081: cannot start already-running ORACLE – shut it down first
Control file created.
ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done |
控制文件重建成功
7.5 遇到600错误
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2172129 generated at 07/19/2014 00:53:08 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2172129 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
ORA-01112: media recovery not started
www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [],[], [] |
这里触发了ORA-00600 kcbgtcr_13错误,只需要手动提交事务就可以了。
7.6 手动提交事务信息
[09:50:33]www.htz.pw > startup mount pfile=’/tmp/123.ora’; [09:50:34]ORACLE instance started. [09:50:34] [09:50:34]Total System Global Area 167772160 bytes [09:50:34]Fixed Size 2082432 bytes [09:50:34]Variable Size 100665728 bytes [09:50:34]Database Buffers 54525952 bytes [09:50:34]Redo Buffers 10498048 bytes [09:50:38]Database mounted. [09:50:51]www.htz.pw > recover database using backup controlfile until cancel; [09:50:51]ORA-00279: change 2172135 generated at 07/19/2014 01:41:17 needed for thread 1 [09:50:51]ORA-00289: suggestion : [09:50:51]/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ [09:50:51]%u_.arc [09:50:51]ORA-00280: change 2172135 for thread 1 is in sequence #1 [09:50:51] [09:50:51] [09:50:51]Specify log: {<RET>=suggested | filename | AUTO | CANCEL} [09:50:53]cancel [09:50:54]ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below [09:50:54]ORA-01194: file 1 needs more recovery to be consistent [09:50:54]ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ [09:50:54] [09:50:54] [09:50:54]ORA-01112: media recovery not started [09:50:54] [09:50:54] [09:51:06]www.htz.pw > alter database open resetlogs; [09:51:09]alter database open resetlogs [09:51:09]* [09:51:09]ERROR at line 1: [09:51:09]ORA-01092: ORACLE instance terminated. Disconnection forced |
后面alert报下面的错误
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc: ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Sat Jul 19 01:43:01 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Sat Jul 19 01:43:01 2014 Error 704 happened during db open, shutting down database |
在trace文件中查看有那些块没有提交。
[oracle@www.htz.pw ~]$grep -E ‘^Block header dump|^0x0’ /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000 Block header dump: 0x0040017c 0x01 0x0000.022.00000002 0x00400196.0004.37 –U- 12 fsc 0x0000.00000147 Block header dump: 0x004000da 0x01 0x0004.00c.0000011d 0x0080559d.00d3.02 C— 0 scn 0x0000.0008ab18 Block header dump: 0x004000db 0x01 0x0008.017.00000002 0x00800080.0000.01 CBU- 0 scn 0x0000.00002404 0x02 0x0004.01a.0000017a 0x0080003c.016b.32 –U- 1 fsc 0x000e.001e8de0 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000 Block header dump: 0x0040006a 0x01 0x0000.008.00000034 0x0040019e.003b.07 C— 0 scn 0x0000.0021245c |
通过10046跟踪报错的SQL语句
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Current SQL statement for this session: select ctime, mtime, stime from obj$ where obj# = :1 —– Call Stack Trace —– 这里看到了报错的SQL语句,以SQL语句来搜索,直到搜索到如下的 Cursor#5(0x2a97ca18b0) state=FETCH curiob=0x2a97cba468 curflg=f fl2=0 par=0x2a97ca1710 ses=0x69f82a30 sqltxt(0x69a944b0)=select ctime, mtime, stime from obj$ where obj# = :1 hash=fa0bd3f60d6ee4f2495f9af8199b75b9 parent=0x6677c4b8 maxchild=01 plk=0x66f56af0 ppn=n cursor instantiation=0x2a97cba468 used=1405705379 child#0(0x69a94288) pcs=0x6677c0c8 clk=0x66f56dd0 ci=0x6677b7b0 pn=0x69ad37f0 ctx=0x6616fe90 kgsccflg=0 llk[0x2a97cba470,0x2a97cba470] idx=0 xscflg=e0141476 fl2=45000401 fl3=4022210c fl4=100 Bind bytecodes Opcode = 1 Unoptimized Offsi = 48, Offsi = 0 kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2a97cba020 bln=22 avl=02 flg=05 value=20 |
这里中可以看到绑定变量的值是20.在相同的版本其它的数据库中执行下面的操作
SQL> select rowid from obj$ where obj# =20;
ROWID —————— AAAAASAABAAAAB6AAA
SQL> @rowid_to_info.sql Enter value for rowid: AAAAASAABAAAAB6AAA ROWID_TYPE: 1 OBJECT_NUMBER: 18 RELATIVE_FNO: 1 BLOCK_NUMBER: 122 ROW_NUMBER: 0
PL/SQL procedure successfully completed.
正在好trace文件中的 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000 |
其实我们还可以从10046trace文件中找到此信息如下:
===================== PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1372757926978214 hv=429618617 ad=’69a944b0′ select ctime, mtime, stime from obj$ where obj# = :1 END OF STMT PARSE #5:c=0,e=234,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978212 BINDS #5: kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2a97cba020 bln=22 avl=02 flg=05 value=20 EXEC #5:c=0,e=330,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978586 WAIT #5: nam=’db file sequential read’ ela= 23 file#=1 block#=218 blocks=1 obj#=-1 tim=1372757926978753 WAIT #5: nam=’db file sequential read’ ela= 9 file#=1 block#=219 blocks=1 obj#=-1 tim=1372757926978804 WAIT #5: nam=’db file sequential read’ ela= 7 file#=1 block#=122 blocks=1 obj#=-1 tim=1372757926978841 |
这里需要注意的是绑定变量的值。
在trace文件中可以发现下面的内容
tab 0, row 26, @0x18f1 tl: 70 fb: –H-FL– lb: 0x1 cc: 17 col 0: [ 2] c1 02 col 1: [ 4] c3 06 17 08 col 2: [ 1] 80 col 3: [12] 5f 4e 45 58 54 5f 4f 42 4a 45 43 54 col 4: [ 2] c1 02 col 5: *NULL* col 6: [ 1] 80 col 7: [ 7] 78 6c 03 0c 01 28 31 col 8: [ 7] 78 72 07 13 01 3b 01 col 9: [ 7] 78 6c 03 0c 01 28 31 col 10: [ 1] 80 col 11: *NULL* col 12: *NULL* col 13: [ 1] 80 col 14: *NULL* col 15: [ 1] 80 col 16: [ 4] c3 07 38 24 |
bbed手动提交事务,需要更改itl与行中的lck值
BBED> p ktbbh struct ktbbh, 48 bytes @20 ub1 ktbbhtyp @20 0x01 (KDDBTDATA) union ktbbhsid, 4 bytes @24 ub4 ktbbhsg1 @24 0x00000012 ub4 ktbbhod1 @24 0x00000012 struct ktbbhcsc, 8 bytes @28 ub4 kscnbas @28 0x0021251b ub2 kscnwrp @32 0x0000 b2 ktbbhict @36 1 ub1 ktbbhflg @38 0x02 (NONE) ub1 ktbbhfsl @39 0x00 ub4 ktbbhfnx @40 0x00000000 struct ktbbhitl[0], 24 bytes @44 struct ktbitxid, 8 bytes @44 ub2 kxidusn @44 0x0003 ub2 kxidslt @46 0x0001 ub4 kxidsqn @48 0x00000191 struct ktbituba, 8 bytes @52 ub4 kubadba @52 0x0080002b ub2 kubaseq @56 0x014c ub1 kubarec @58 0x03 ub2 ktbitflg @60 0x0001 (NONE) union _ktbitun, 2 bytes @62 b2 _ktbitfsc @62 0 ub2 _ktbitwrp @62 0x0000 ub4 ktbitbas @64 0x00000000 BBED> modify /x 80 offset 61 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1) Block: 122 Offsets: 61 to 70 Dba:0x0040007a ———————————————————————— 80000000 00000000 016c BBED> x /rn *kdbr[26] rowdata[5278] @6453 ————- flag@6453: 0x2c (KDRHFL, KDRHFF, KDRHFH) lock@6454: 0x01 cols@6455: 17
col 0[2] @6456: 1 col 1[4] @6459: 52207 col 2[1] @6464: 0 col 3[12] @6466: -0 col 4[2] @6479: 1 col 5[0] @6482: *NULL* col 6[1] @6483: 0 col 7[7] @6485: -0 col 8[7] @6493: -0 col 9[7] @6501: -0 col 10[1] @6509: 0 col 11[0] @6511: *NULL* col 12[0] @6512: *NULL* col 13[1] @6513: 0 col 14[0] @6515: *NULL* col 15[1] @6516: 0 col 16[4] @6518: 65535
BBED> modify /x 00 offset 6454 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1) Block: 122 Offsets: 6454 to 6463 Dba:0x0040007a ———————————————————————— 001102c1 0204c306 1708 BBED> sum apply Check value for File 1, Block 122: current = 0x3d20, required = 0x3d20
BBED> verify DBVERIFY – Verification starting FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf BLOCK = 122
DBVERIFY – Verification complete
Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0 |
7.7 报00600坏块的错误
www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced
Sat Jul 19 01:56:37 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_1894.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [106], [6101], [], [], [], [] Error 604 happened during db open, shutting down database USER: terminating instance due to error 604 Instance terminated by USER, pid = 1894 ORA-1092 signalled during: alter database open resetlogs… 这里可以看到数据文件1,块106,出现了6101的错误。此错误由于是ITL中的值与LOCK不一致导致的。 |
bbed修改行的lock值
BBED> set dba 1,106 DBA 0x0040006a (4194410 1,106)
BBED> verify DBVERIFY – Verification starting FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf BLOCK = 106
Block Checking: DBA = 4194410, Block Type = KTB-managed data block data header at 0x2a97696244 kdbchk: row locked by non-existent transaction table=0 slot=10 lockid=1 ktbbhitc=1 Block 106 failed with check code 6101
DBVERIFY – Verification complete
Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 1 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0 |
此报错的修改见6101(row locked by non-existent transaction)
7.8 启动数据库
通过上面几步操作,再次启动数据库
www.htz.pw > startup mount pfile=’/tmp/123.ora’; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile unitl cancel; ORA-00905: missing keyword
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2172139 generated at 07/19/2014 01:56:34 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2172139 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
ORA-01112: media recovery not started
www.htz.pw > alter database open resetlogs;
Database altered.
使用原参数能正常启动数据库。 www.htz.pw > startup force; ORACLE instance started.
Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. Database opened. |
整个实验测试结束
ORA-01173的模拟与故障处理:等您坐沙发呢!