现象: 客户的数据库(RAC环境:11.1.0.6)发生了实例异常宕机现象,伴随有ORA-07445错误: Sun Jun 23 01:00:06 2013 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xF] [PC:0x755773D, kcbw_get_bh()+67] Errors in file /Oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_mman_2015.trc (incident=298938): ORA-07445: exception encountered: core dump [kcbw_get_bh()+67] [SIGSEGV] [ADDR:0xF] [PC:0x755773D] [Address not mapped to object] [] Incident details in: /oracle/app/11gR1/diag/rdbms/xij/xij1/incident/incdir_298938/xij1_mman_2015_i298938.trc Sun Jun 23 01:00:07 2013 Trace dumping is performing id=[cdmp_20130623010007] Sun Jun 23 01:00:09 2013 Sweep Incident[298938]: completed Sun Jun 23 01:00:09 2013 Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_pmon_1981.trc: ORA-00822: MMAN process terminated with error PMON (ospid: 1981): terminating the instance due to error 822 Sun Jun 23 01:00:09 2013 Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_j000_22268.trc: ORA-00822: MMAN process terminated with error Sun Jun 23 01:00:09 2013 Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_m000_22430.trc: ORA-00822: MMAN process terminated with error System state dump is made for local instance System State dumped to trace file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_diag_1987.trc Sun Jun 23 01:00:09 2013 ORA-1092 : opiodr aborting process unknown ospid (11096_47524616916112) Sun Jun 23 01:00:09 2013 ORA-1092 : opitsk aborting process Sun Jun 23 01:00:09 2013 ORA-1092 : opiodr aborting process unknown ospid (6317_47353365785744) Sun Jun 23 01:00:09 2013 ORA-1092 : opitsk aborting process Sun Jun 23 01:00:09 2013 ORA-1092 : opiodr aborting process unknown ospid (28698_47056912551056) Sun Jun 23 01:00:09 2013 ORA-1092 : opitsk aborting process Sun Jun 23 01:00:09 2013 ORA-1092 : opiodr aborting process unknown ospid (18927_47567504653456) Sun Jun 23 01:00:10 2013 ORA-1092 : opitsk aborting process Sun Jun 23 01:00:10 2013 Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_q001_3487.trc: ORA-00822: MMAN process terminated with error ORA-1092 : opidrv aborting process Q001 ospid (3487_47252506410128) Sun Jun 23 01:00:11 2013 ORA-1092 : opitsk aborting process Sun Jun 23 01:00:11 2013 License high water mark = 510 Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_m000_22430.trc: ORA-00822: MMAN process terminated with error ORA-00822: MMAN process terminated with error Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_j000_22268.trc: ORA-00449: background process "LGWR" unexpectedly terminated with error 822 ORA-00822: MMAN process terminated with error Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_j000_22268.trc: ORA-00449: background process "LGWR" unexpectedly terminated with error 822 ORA-00822: MMAN process terminated with error Errors in file /oracle/app/11gR1/diag/rdbms/xij/xij1/trace/xij1_j000_22268.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-00822: MMAN process terminated with error ORA-06512: at "WKSYS.WK_JOB", line 442 ORA-00449: background process "MMON" unexpectedly terminated with error 822 ORA-00822: MMAN process terminated with error ORA-06512: at line 1 ORA-1092 : opidrv aborting process J000 ospid (22268_47357930925200) Sun Jun 23 01:00:20 2013 Instance terminated by PMON, pid = 1981 Sun Jun 23 01:00:21 2013 USER (ospid: 22527): terminating the instance Instance terminated by USER, pid = 22527 Sun Jun 23 01:00:26 2013 Starting ORACLE instance (normal)分析: Ora-07445通常是Oracle自身的BUG导致的, 首先使用IPS收集了alert中的错误信息(IPS使用方法见我的另一篇文章《IPS简单使用方法》): 搜寻了一下metalink,发现客户的问题跟以下三篇Note中描述的BUG类似: ORA-7445 (kcbw_get_bh) [ID 1341402.1] Bug 9728912 [https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=9728912] - PMON terminates instance due to ORA-7445 [kcbw_numperchunk] / ORA-7445 [kcbw_get_bh]] [ID 9728912.8] Instance Crashed On ORA-7445 kcbw_numperchunk [ID 1364264.1] 但根据Note可以看到,相关的BUG已经在11.1.0.6中fix掉了。 看看客户数据库中的其余严重错误信息: Node1: adrci> show problemADR Home = /oracle/app/11gR1/diag/rdbms/xij/xij1: ************************************************************************* PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME -------------------- ----------------------------------------------------------- -------------------- ---------------------------------------- 5 ORA 7445 [kcbw_get_bh()+67] 298938 2013-06-23 01:00:06.373716 +08:00 11 ORA 600 276161 2013-06-04 18:12:12.709933 +08:00 10 ORA 600 [729] 276160 2013-06-04 18:09:27.857128 +08:00 7 ORA 7445 [kgghash()+367] 253234 2013-06-03 15:27:04.349337 +08:00 9 ORA 7445 [kksMapCursor()+323] 256538 2013-05-27 09:54:58.684956 +08:00 8 ORA 7445 [qkabxo()+22] 251194 2013-05-01 22:03:37.715416 +08:00 2 ORA 600 [kghfrh:ds] 238818 2013-01-28 11:35:23.755034 +08:00 6 ORA 7445 [eoa_pm_push()+31] 239218 2013-01-28 11:24:42.835685 +08:00 3 ORA 7445 [ioei_get_method_counts()+39] 71129 2012-10-17 11:17:39.735719 +08:00 4 ORA 7445 [jol_calculate_transitive_interface_set()+1165] 74233 2012-10-17 11:05:51.570021 +08:00 1 ORA 600 [kghfru:ds] 6369 2012-09-07 17:35:55.001585 +08:00 11 rows fetched Node2: [oracle@XIJ02 ~]$ adrciADRCI: Release 11.1.0.6.0 - Beta on Mon Jun 24 14:59:37 2013Copyright (c) 1982, 2007, Oracle. All rights reserved. ADR base = "/oracle/app/11gR1" adrci> adrci> adrci> set homepath diag/rdbms/xij/xij2 adrci> adrci> show problem ADR Home = /oracle/app/11gR1/diag/rdbms/xij/xij2: ************************************************************************* PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME -------------------- ----------------------------------------------------------- -------------------- ---------------------------------------- 1 ORA 7445 [kgghash()+367] 209965 2013-06-16 23:34:39.333982 +08:00 2 ORA 7445 [kksMapCursor()+323] 190129 2013-05-27 09:54:56.121652 +08:00 2 rows fetched adrci> 解决方法: 在客户的2个节点中一共发现了13个疑似BUG引起的数据库故障,总体而言,Oracle 11.1.0.6不算太稳定的版本,存在着各种BUG, Oracle在11.1.0.7中Fix掉了11.1.0.6中发现的大部分BUG,所以相对而言要稳定得多,因此建议客户升级数据库至11.1.0.7或者11.2.0.3。附: (Triage Tool 3.01, routed by file analysis): Failing Function: kcbw_get_bh Route To: BUFFER CACHE:MANAGEABILITY Error Argument: [kcbw_get_bh] Type of Error: ORA-07445 File Name: xij1_mman_2015_i298938.trc Comment: Routed by Error Argument, Conventional routing DB Version: 11.1.0.6.0 Platform: Linux CPU: x86_64 OS Version: 2.6.18-194.el5 Stack Trace: kcbw_get_bh kcbw_get_first_buffer kcbw_next_free kmgs_extract_mem_from_granule kmgs_process_request_immediate kmgs_process_request kmgsdrv ksbabs ksbrdp opirip在CentOS 6.4下安装Oracle 11gR2(x64) http://www.linuxidc.com/Linux/2014-02/97374.htmOracle 11gR2 在VMWare虚拟机中安装步骤 http://www.linuxidc.com/Linux/2013-09/89579p2.htmDebian 下 安装 Oracle 11g XE R2 http://www.linuxidc.com/Linux/2014-03/98881.htm更多Oracle相关信息见Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12本文永久更新链接地址