上周五去客户处巡检时,发现原来配置的DG备库未与主库同步,特此记录一下在5月14日13:58出现故障,3398日志无法应用,以下是alert日志Thu May 14 13:58:01 2015 Errors in file /u01/app/Oracle/diag/rdbms/tcdg/tc/trace/tc_pr00_3473760.trc: ORA-01119: error in creating database file "+data" ORA-17502: ksfdcre:4 Failed to create file +data ORA-15041: diskgroup "DATA" space exhausted File #108 added to control file as "UNNAMED00108". Originally created as: "+DATA/tc/datafile/oa2015.387.879688649" Recovery was unable to create the file as: "+data" MRP0: Background Media Recovery terminated with error 1274 Errors in file /u01/app/oracle/diag/rdbms/tcdg/tc/trace/tc_pr00_3473760.trc: ORA-01274: cannot add datafile "+DATA/tc/datafile/oa2015.387.879688649" - file could not be created Managed Standby Recovery not using Real Time Apply Recovery interrupted! Recovered data files to a consistent state at change 10293976250522 Thu May 14 13:58:06 2015 MRP0: Background Media Recovery process shutdown (tc)由于空间不足,出现了ORA-15041错误,然后MRP0进程被终止再来看trace文件内容:*** 2015-05-14 13:58:01.562 ----- START Event Driven Actions Dump ---- ---- END Event Driven Actions Dump ---- ----- START DDE Actions Dump ----- Executing SYNC actions Executing ASYNC actions ----- END DDE Actions Dump (total 0 csec) ----- ORA-01119: error in creating database file "+data" ORA-17502: ksfdcre:4 Failed to create file +data ORA-15041: diskgroup "DATA" space exhausted File #108 added to control file as "UNNAMED00108". Originally created as: "+DATA/tc/datafile/oa2015.387.879688649" Recovery was unable to create the file as: "+data" *** 2015-05-14 13:58:01.598 4329 krsh.c MRP0: Background Media Recovery terminated with error 1274 ORA-01274: cannot add datafile "+DATA/tc/datafile/oa2015.387.879688649" - file could not be created *** 2015-05-14 13:58:01.613 4329 krsh.c Managed Standby Recovery not using Real Time Apply*** 2015-05-14 13:58:01.616 MRP: Prodding archiver at standby for thread 1 seq 3398 ----- Redo read statistics for thread 1 ----- Read rate (ASYNC): 1631875714Kb in 8455733.63s => 0.19 Mb/sec Total redo bytes: 1631903362Kb Longest record: 53Kb, moves: 546631/3909060988 moved: 2869Mb (0%) Longest LWN: 61559Kb, reads: 53311640 Last redo scn: 0x095c.c0cc389a (10293976250522) Change vector header moves = 505014709/3028421495 (1%) ----------------------------------------------*** 2015-05-14 13:58:01.626 Media Recovery drop redo thread 1 KCBR: Redo cache copies/changes = 7576594/7576586 Wait to push change maps to slaves = 136500s*** 2015-05-14 13:58:02.545 Completed Media Recovery Checking to start in-flux buffer recovery from SCN 2396.3234383719 to SCN (non-inclusive) 2396.3234609306 Influx recovery found in-flux buffers*** 2015-05-14 13:58:02.622 Influx Media Recovery add redo thread 1 Managed Standby Recovery: Standby online log for thr 1 seq 3397 not found. Looking whether archived.. Looking for archived log thr 1 seq 3397 recovery branch id 869789191 that contains SCN 10293976024935 Checking whether scn 10293976024935 in los 10293976024935 and nxs 10293976125751 Managed Standby Recovery: Opening archived log /oraarch/ARC_CRM5_3397_869789191_1.log during invocation of recoverable recovery*** 2015-05-14 13:58:03.937 Resized overflow buffer to 2435K (for 2435K LWN) Resized overflow buffer to 3923K (for 3923K LWN) Resized overflow buffer to 14242K (for 14242K LWN)*** 2015-05-14 13:58:06.105 MRP: Prodding archiver at standby for thread 1 seq 3398 Managed Recovery: Not Active posted.由于redo apply的进程被终止,因此应用到3397归档后,之后的日志都未被应用,但是归档日志都已经顺利传到了备库SQL> select thread#,sequence#,applied from v$archived_log where sequence#>3391; THREAD# SEQUENCE# APPLIED ---------- ---------- --------- 1 3392 YES 1 3393 YES 1 3394 YES 1 3395 YES 1 3396 YES 1 3397 YES 1 3398 NO 1 3399 NO 1 3400 NO 1 3401 NO 1 3402 NO 1 3403 NO 1 3404 NO 1 3405 NO 1 3406 NO 1 3407 NO 1 3408 NO 1 3409 NO 1 3410 NO 1 3411 NO 1 3412 NO 1 3413 NO还可以通过下面的SQL语句去分析备库落后了多少归档日志未完成应用。SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", 2 (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference" FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH, (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1; 3 4 5 6 7 Thread Last Sequence Received Last Sequence Applied Difference ---------- ---------------------- --------------------- ---------- 1 3531 3397 134此时归档应用停在了3398上面,加上之后的,共有134个归档没应用这个时间点在主库执行的操作是添加数据文件:Thu May 14 13:57:28 2015 ALTER TABLESPACE oa2015 ADD DATAFILE "+DATA" SIZE 16G AUTOEXTEND OFF Thu May 14 13:58:05 2015 Completed: ALTER TABLESPACE oa2015 ADD DATAFILE "+DATA" SIZE 16G AUTOEXTEND OFF备库磁盘空间确实不足,主库要求创建的数据文件为2个16G的文件,但是备库磁盘组可用空间只剩3G左右了ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 1638400 3265 0 3265 0 N DATA/和客户沟通了下,他们打算把DG停掉,暂时不用了,因此也没有做进一步处理,如果不想让归档日志传递到备库,可以设置主库参数log_archived_dest_state_2=defer。如果要解决故障,就要先添加磁盘到磁盘组,然后重新启用MRP进程,应用归档日志。更多Oracle相关信息见Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12本文永久更新链接地址