Recovering ASM based OCR & Voting disk in 11gR2

In Oracle 11gr2 we can use ASM as the storage for OCR and VOTING disks.

There are few factors we need to understand

1. Directly restore of a manual / automatic OCR backup is not possible, if the OCR is stored on ASM disk group.
2. For successful ASM start, CRS stack must be up.
3. At the time of OCR restore, OCR should not be in use, i.e. no CRS daemon must be running.

So it is kind of cyclic dependency between ASM & CRS. There is a way to overcome this scenario. But to keep the setup simple, we can always follow traditional approach of the using CFS (Clustered File System) to store the OCR and VOTING disks.

In this scenario, I’ve described how to recover lost OCR & VOTING disks on ASM due to underlying disk failure.

Setup Details

GRID user:                        oragrid
GRID home:                        /oragrid/11.2
ASM disk group name for OCR:      +DATA
ASM disk name:    DATA_0000
OS device name for ASM disk:    /dev/rdsk/c0t60060E800564F700000064F700000561d0s6
Cluster name:                     sol-cluster
Nodes:                            node1, node2

1. Check the existing ASM dg/disk details

SQL>  select name, state from v$asm_diskgroup
/
NAME                           STATE
—————————— ———–
DATA                           MOUNTED

SQL> select name, path, header_status from v$asm_disk
/

NAME  PATH       HEADER_STATUS
——  ————————————————- ————-
DATA_0000 /dev/rdsk/c0t60060E800564F700000064F700000561d0s6 MEMBER

2. Check the voting disk details

# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
 1. ONLINE   870c7767d70f4f68bf47f0eeef62c467 (/dev/rdsk/c0t60060E800564F700000064F700000561d0s6) [DATA]
Located 1 voting disk(s).

3. Check OCR details

# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2332
         Available space (kbytes) :     259788
         ID                       :  775547019
         Device/File Name         :      +DATA
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

4. Check ASM spfile details

$ export ORACLE_SID=+ASM1
$ sqlplus / as sysasm

SQL> show parameter spfile

NAME                                 TYPE        VALUE
———————————— ———– ——————————
spfile                               string      +DATA/sol-cluster/asmparam
                                                 eterfile/registry.253.72581421
                                                 9

5. Check most recent OCR backup (on all cluster nodes)

/oragrid/11.2/cdata/sol-cluster

drwxrwxr-x   2 oragrid  oinstall     512 Sep  2 15:45 .
drwxrwxr-x   5 oragrid  oinstall     512 Jul 31 15:02 ..
-rw——-   1 root     root     6635520 Sep  2 15:45 backup00.ocr
-rw——-   1 root     root     6635520 Aug 30 11:25 backup01.ocr
-rw——-   1 root     root     6635520 Aug 30 07:25 backup02.ocr
-rw——-   1 root     root     6635520 Sep  2 15:45 day.ocr
-rw——-   1 root     root     6635520 Aug 21 03:24 week.ocr
-rw——-   1 root     root     6635520 Aug 28 03:24 week_.ocr

6. Simulate the failure

# dd if=/dev/zero of=/dev/rdsk/c0t60060E800564F700000064F700000561d0s6 bs=8192 count=100
100+0 records in
100+0 records out

Here disk is overwritten by dd command. All of contents are lost (OCR,VOTE,SPFILE)

7. Now from the node where latest OCR backup exists, start the CRS in the exclusive mode.

This mode will allow ASM to start & stay up without the presence of a Voting disk and without the CRS daemon process running
# crsctl start crs -excl

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘node1’
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘node1’
CRS-2676: Start of ‘ora.gipcd’ on ‘node1’ succeeded
CRS-2676: Start of ‘ora.mdnsd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘node1’
CRS-2676: Start of ‘ora.gpnpd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1’
CRS-2679: Attempting to clean ‘ora.diskmon’ on ‘node1’
CRS-2681: Clean of ‘ora.diskmon’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘node1’
CRS-2676: Start of ‘ora.diskmon’ on ‘node1’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘node1’
CRS-2676: Start of ‘ora.ctssd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.asm’ on ‘node1’
CRS-2676: Start of ‘ora.asm’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘node1’
CRS-2676: Start of ‘ora.crsd’ on ‘node1’ succeeded

8. Check the OS processes

# ps -ef | grep grid
 oragrid 10301     1   0 19:15:19 ?           0:00 asm_gmon_+ASM1
    root 10108     1   0 19:14:42 ?           0:02 /oragrid/11.2/bin/ohasd.bin exclusive
 oragrid 10287     1   0 19:15:18 ?           0:00 asm_lmhb_+ASM1
    root 10248     1   0 19:15:13 ?           0:00 /oragrid/11.2/bin/octssd.bin
 oragrid 10295     1   0 19:15:19 ?           0:00 asm_ckpt_+ASM1
    root 10193     1   0 19:14:54 ?           0:00 /oragrid/11.2/bin/cssdagent
    root 10191     1   0 19:14:53 ?           0:00 /oragrid/11.2/bin/cssdmonitor
 oragrid 10181     1   0 19:14:52 ?           0:00 /oragrid/11.2/bin/gpnpd.bin
 oragrid 10319     1   0 19:15:19 ?           0:00 asm_lck0_+ASM1
 oragrid 10350     1   0 19:15:30 ?           0:00 oracle+ASM1_o000_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 oragrid 10262     1   0 19:15:15 ?           0:00 asm_pmon_+ASM1
 oragrid 10149     1   0 19:14:49 ?           0:00 /oragrid/11.2/bin/oraagent.bin
 oragrid 10346     1   0 19:15:30 ?           0:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 oragrid 10283     1   0 19:15:18 ?           0:00 asm_lms0_+ASM1
 oragrid 10297     1   0 19:15:19 ?           0:00 asm_smon_+ASM1
    root 10195     1   0 19:14:54 ?           0:00 /oragrid/11.2/bin/orarootagent.bin
 oragrid 10299     1   0 19:15:19 ?           0:00 asm_rbal_+ASM1
 oragrid 10289     1   0 19:15:18 ?           0:00 asm_mman_+ASM1
 oragrid 10268     1   0 19:15:15 ?           0:00 asm_gen0_+ASM1
 oragrid 10214     1   0 19:14:54 ?           0:01 /oragrid/11.2/bin/ocssd.bin -X
 oragrid 10264     1   0 19:15:15 ?           0:00 asm_vktm_+ASM1
 oragrid 10224     1   0 19:15:00 ?           0:00 /oragrid/11.2/bin/diskmon.bin -d -f
 oragrid 10291     1   0 19:15:18 ?           0:00 asm_dbw0_+ASM1
 oragrid 10280     1   0 19:15:16 ?           0:00 asm_lmd0_+ASM1
 oragrid 10272     1   0 19:15:15 ?           0:00 asm_ping_+ASM1
 oragrid 10293     1   0 19:15:19 ?           0:00 asm_lgwr_+ASM1
 oragrid 10305     1   0 19:15:19 ?           0:00 asm_mmnl_+ASM1
 oragrid 10274     1   0 19:15:16 ?           0:00 asm_psp0_+ASM1
 oragrid 10270     1   0 19:15:15 ?           0:00 asm_diag_+ASM1
 oragrid 10303     1   0 19:15:19 ?           0:00 asm_mmon_+ASM1
    root 10332     1   0 19:15:29 ?           0:00 /oragrid/11.2/bin/crsd.bin reboot
 oragrid 10278     1   0 19:15:16 ?           0:00 asm_lmon_+ASM1
 oragrid 10276     1   0 19:15:16 ?           0:00 asm_dia0_+ASM1
    root 10371 27156   0 19:15:42 pts/2       0:00 grep grid
 oragrid 10348     1   0 19:15:30 ?           0:00 asm_o000_+ASM1
 oragrid 10309     1   0 19:15:19 ?           0:00 /oragrid/11.2/bin/oclskd.bin
 oragrid 10352     1   0 19:15:30 ?           0:00 oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 oragrid 10179     1   0 19:14:49 ?           0:00 /oragrid/11.2/bin/mdnsd.bin
 oragrid 10159     1   0 19:14:49 ?           0:00 /oragrid/11.2/bin/gipcd.bin
 oragrid 10344     1   0 19:15:30 ?           0:00 asm_asmb_+ASM1
ASM instance is up and running. CRS seems to be stared (crsd.bin). But it dies after some time, as OCR is not available.

9. Check if any DG’s are mounted

SQL> select name, state from v$asm_diskgroup
/
no rows selected
SQL> select name, path, header_status from v$asm_disk
/

NAME  PATH       HEADER_STATUS
——  ————————————————- ————-
  /dev/rdsk/c0t60060E800564F700000064F700000561d0s6 CANDIDATE

10. Create a diskgroup

SQL> create diskgroup DATA external redundancy disk ‘/dev/rdsk/c0t60060E800564F700000064F700000561d0s6’ attribute ‘COMPATIBLE.ASM’ = ‘11.2’
/
******************************************************
SQL> create diskgroup DATA external redundancy disk
‘/dev/rdsk/c0t60060E800564F700000064F700000561d0s6’
attribute ‘COMPATIBLE.ASM’ = ‘11.2’
/
create diskgroup DATA external redundancy disk
‘/dev/rdsk/c0t60060E800564F700000064F700000561d0s6’
attribute ‘COMPATIBLE.ASM’ = ‘11.2’
*
ERROR at line 1:
ORA-15260: permission denied on ASM disk group

NOT logged in as SYSASM
******************************************************

SQL>  create diskgroup DATA external redundancy disk ‘/dev/rdsk/c0t60060E800564F700000064F700000561d0s6’ attribute ‘COMPATIBLE.ASM’ = ‘11.2’
/
Diskgroup created.
SQL>  select name, state from v$asm_diskgroup
/

NAME                           STATE
—————————— ———–
DATA                           MOUNTED

SQL> select name, path, header_status from v$asm_disk
/

NAME  PATH       HEADER_STATUS
——  ————————————————- ————-
DATA_0000 /dev/rdsk/c0t60060E800564F700000064F700000561d0s6 MEMBER
  

11. Now restore the latest OCR backup

# cd  /oragrid/11.2/cdata/sol-cluster

# ocrconfig -restore backup00.ocr
#

12. Start the CRS

# crsctl start res ora.crsd -init
CRS-2672: Attempting to start ‘ora.crsd’ on ‘node1’
CRS-2676: Start of ‘ora.crsd’ on ‘node1′ succeeded

13. Replace votedisk

# crsctl replace votedisk +DATA
Successful addition of voting disk 28fb8cd12ad44fd4bfb15df0c4572550.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced

14. Create a ASM pfile

/oragrid/11.2/dbs/tmp.ora

*.asm_power_limit=1
*.diagnostic_dest=’/oragrid/11.2/log’
*.instance_type=’asm’
*.large_pool_size=12M
*.remote_login_passwordfile=’EXCLUSIVE’

15. Create spfile from pfile

$ sqlplus / as sysasm

SQL> create spfile=’+DATA’ from pfile=’/oragrid/11.2/dbs/tmp.ora’
/
File created.

16. Stop the CRS forcefully

# crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node1’
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘node1’
CRS-2677: Stop of ‘ora.crsd’ on ‘node1’ succeeded
CRS-2673: Attempting to stop ‘ora.cssdmonitor’ on ‘node1’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘node1’
CRS-2673: Attempting to stop ‘ora.asm’ on ‘node1’
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘node1’
CRS-2677: Stop of ‘ora.cssdmonitor’ on ‘node1’ succeeded
CRS-2677: Stop of ‘ora.mdnsd’ on ‘node1’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘node1’ succeeded
CRS-2677: Stop of ‘ora.asm’ on ‘node1’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘node1’
CRS-2677: Stop of ‘ora.cssd’ on ‘node1’ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘node1’
CRS-2673: Attempting to stop ‘ora.diskmon’ on ‘node1’
CRS-2677: Stop of ‘ora.gpnpd’ on ‘node1’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘node1’
CRS-2677: Stop of ‘ora.gipcd’ on ‘node1’ succeeded
CRS-2677: Stop of ‘ora.diskmon’ on ‘node1’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node1’ has completed
CRS-4133: Oracle High Availability Services has been stopped.

17. Start the CRS on all nodes

# crsctl start crs

18. Check CRS status

# crsctl check cluster -all
**************************************************************
node1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

# crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

# crsctl stat res -t
——————————————————————————–
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
——————————————————————————–
Local Resources
——————————————————————————–
ora.DATA.dg
               ONLINE  ONLINE       node1
               ONLINE  ONLINE       node2
ora.LISTENER.lsnr
               ONLINE  ONLINE       node1
               ONLINE  ONLINE       node2
ora.asm
               ONLINE  ONLINE       node1            Started
               ONLINE  ONLINE       node2            Started
ora.eons
               ONLINE  ONLINE       node1
               ONLINE  OFFLINE      node2
ora.gsd
               OFFLINE OFFLINE      node1
               OFFLINE OFFLINE      node2
ora.net1.network
               ONLINE  ONLINE       node1
               ONLINE  ONLINE       node2
ora.ons
               ONLINE  ONLINE       node1
               ONLINE  ONLINE       node2
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  OFFLINE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       node1
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       node1
ora.node1.vip
      1        ONLINE  ONLINE       node1
ora.node2.vip
      1        ONLINE  ONLINE       node2
ora.oc4j
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        ONLINE  ONLINE       node2
ora.scan2.vip
      1        ONLINE  ONLINE       node1
ora.scan3.vip
      1        ONLINE  ONLINE       node1

19. Check the voting disk

# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
 1. ONLINE   99cf2e1fc54a4f8bbf9e1eca34598bd4 (/dev/rdsk/c0t60060E800564F700000064F700000561d0s6) [DATA]
Located 1 voting disk(s).

20. Check OCR

# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2332
         Available space (kbytes) :     259788
         ID                       :  775547019
         Device/File Name         :      +DATA
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

We are back and running again.

Advertisements
This entry was posted in Oracle Automatic Storage Management, Oracle Cluster Ready Services, Oracle Real Application Cluster and tagged . Bookmark the permalink.

6 Responses to Recovering ASM based OCR & Voting disk in 11gR2

  1. Pingback: Other DBA Related « Center Point for Oracle DBA & Kuwait Info

  2. Mubeen says:

    Good job !! very informative.

    thanks

  3. Pingback: 2010 in review | Oracle-Hands-On

  4. kim says:

    Hi,
    trying to recover the OCR/Vote on ASM, I’m getting PROT-22 …odd part is that I recreate OCR diskgroup using the same disks that i corrupted them … Also the diskgroup has plenty free space
    …wonder if you have run into the same dead-end and/or have any insights?
    Thanks,
    Kim
    ocrconfig -restore backup00.ocr
    Errors in file :
    ORA-27091: unable to queue I/O
    ORA-17510: Attempt to do i/o beyond file size
    ORA-06512: at line 4
    PROT-22: Storage too small

  5. Vjay says:

    Thanks!! Precisely presented.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s