Tuesday, March 18, 2014

How many Disks can be lost on Exadata?

Recently I came across a script from Oracle Support written recently to check in the ASM storage to see if a disk or a cell failure/loss can be tolerated, the script will report a PASS or FAIL status depending on whether rebalancing can occur after the loss of a disk or cell(12 disks) in the Exadata Storage. The risk a cell server can fail is unlikely but could occur I personally faced this issue almost 1 year ago in a Production environment with a Half Rack(7 cell nodes) when we lost a cell node for almost 2-3 days however we had enough free space for rebalancing to occur and we could tolerate the lost cell node and there was no downtime to any of the databases.

The Oracle Support note is listed below and the script is also attached to it.
 Understanding ASM Capacity and Reservation of Free Space in Exadata (Doc ID 1551288.1)

Some key points
  • Ensure that you keep FREE_MB column in the ASM lsdg output above the Cell Required Mirror Free MB or Disk Required Mirror Free MB at all times, this number should not go Negative.
  • Disk Required Mirror Free MB is the amount of space that should be reserved for disk failure coverage.
  • One Cell Required Mirror Free MB is the amount of space to reserve for single cell failure coverage, regardless of redundancy type.


Script output below with BEFORE/AFTER results and expected output that will be sent in case of a failure.

BEFORE
State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
  
MOUNTED  NORMAL  N         512   4096  4194304  54042624  20132832         18014208         1059312              0             N  DATA_EXAD/
  
MOUNTED  NORMAL  N         512   4096  4194304    894240    636448           298080          169184              0             Y  DBFS_DG/
  
MOUNTED  NORMAL  N         512   4096  4194304  13512384   7173544          4504128         1334708              0             N  RECO_EXAD/
  
AFTER
State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  54042624  28095768         18014208         5040780              0             N  DATA_EXAD/
MOUNTED  NORMAL  N         512   4096  4194304    894240    636448           298080          169184              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304  13512384   7173208          4504128         1


BEFORE

SQL> @check_asm.sql
------ DISK and CELL Failure Diskgroup Space Reserve Requirements  ------
This procedure determines how much space you need to survive a DISK or CELL
failure. It also shows the usable space
available when reserving space for disk or cell failure.
Please see MOS note 1551288.1 for more information.
.  .  .
Description of Derived Values:
Cell Required Mirror Free MB     : Free MB needed to permit successful rebalance
after losing largest CELL in a DG
2 Cell Required Mirror Free MB   : Free MB needed to permit successful rebalance
after losing 2 largest CELLs in high redundancy DG
Disk Required Mirror Free MB     : Free MB needed to rebalance after loss of
single disk (normal redundancy DG) or double disk (high redundancy DG)
Disk Failure Usable File MB      : Usable space available after reserving space
for disk failure (1 disk in normal or 2 disks in high redundancy DG) and
accounting for mirroring
Cell Failure Usable File MB      : Usable space available after reserving space
for 1 cell failure and accounting for mirroring
2 Cell Failure Usable File MB    : Usable space available after reserving space
for 2 cell failures and accounting for mirroring in a HIGH redundancy DG
.  .  .
ASM Version: 11.2.0.2  - WARNING DISK FAILURE COVERAGE ESTIMATES HAVE NOT BEEN
VERIFIED ON 11.2.0.2 !
.  .  .
-------------------------------------------------------------------------
DG Name:                                 DATA_EXAD
DG Type:                                    NORMAL
Num Disks:                                      36
Disk Size MB:                            1,501,184
.  .  .
DG Total MB:                            54,042,624
DG Used MB:                             34,648,092
DG Free MB:                             19,394,532
.  .  .
Cell Required Mirror Free MB:           27,021,312
.  .  .
Disk Required Mirror Free MB:            1,636,279
.  .  .
Disk Failure Usable File MB:             8,879,126
Cell Failure Usable File MB:            -3,813,390
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: FAIL
-------------------------------------------------------------------------
DG Name:                                   DBFS_DG
DG Type:                                    NORMAL
Num Disks:                                      30
Disk Size MB:                               29,808
.  .  .
DG Total MB:                               894,240
DG Used MB:                                257,792
DG Free MB:                                636,448
.  .  .
Cell Required Mirror Free MB:              447,120
.  .  .
Disk Required Mirror Free MB:               53,600
.  .  .
Disk Failure Usable File MB:               291,424
Cell Failure Usable File MB:                94,664
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: PASS
-------------------------------------------------------------------------
DG Name:                                 RECO_EXAD
DG Type:                                    NORMAL
Num Disks:                                      36
Disk Size MB:                              375,344
.  .  .
DG Total MB:                            13,512,384
DG Used MB:                              7,484,712
DG Free MB:                              6,027,672
.  .  .
Cell Required Mirror Free MB:            6,756,192
.  .  .
Disk Required Mirror Free MB:              423,896
.  .  .
Disk Failure Usable File MB:             2,801,888
Cell Failure Usable File MB:              -364,260
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: FAIL
.  .  .
Script completed.
  
PL/SQL procedure successfully completed.
  
SQL> exit


AFTER

SQL> @check_asm.sql
------ DISK and CELL Failure Diskgroup Space Reserve Requirements  ------
This procedure determines how much space you need to survive a DISK or CELL
failure. It also shows the usable space
available when reserving space for disk or cell failure.
Please see MOS note 1551288.1 for more information.
.  .  .
Description of Derived Values:
Cell Required Mirror Free MB     : Free MB needed to permit successful rebalance
after losing largest CELL in a DG
2 Cell Required Mirror Free MB   : Free MB needed to permit successful rebalance
after losing 2 largest CELLs in high redundancy DG
Disk Required Mirror Free MB     : Free MB needed to rebalance after loss of
single disk (normal redundancy DG) or double disk (high redundancy DG)
Disk Failure Usable File MB      : Usable space available after reserving space
for disk failure (1 disk in normal or 2 disks in high redundancy DG) and
accounting for mirroring
Cell Failure Usable File MB      : Usable space available after reserving space
for 1 cell failure and accounting for mirroring
2 Cell Failure Usable File MB    : Usable space available after reserving space
for 2 cell failures and accounting for mirroring in a HIGH redundancy DG
.  .  .
ASM Version: 11.2.0.2  - WARNING DISK FAILURE COVERAGE ESTIMATES HAVE NOT BEEN
VERIFIED ON 11.2.0.2 !
.  .  .
-------------------------------------------------------------------------
DG Name:                                 DATA_EXAD
DG Type:                                    NORMAL
Num Disks:                                      36
Disk Size MB:                            1,501,184
.  .  .
DG Total MB:                            54,042,624
DG Used MB:                             25,946,856
DG Free MB:                             28,095,768
.  .  .
Cell Required Mirror Free MB:           27,021,312
.  .  .
Disk Required Mirror Free MB:            1,636,279
.  .  .
Disk Failure Usable File MB:            13,229,744
Cell Failure Usable File MB:               537,228
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: PASS
-------------------------------------------------------------------------
DG Name:                                   DBFS_DG
DG Type:                                    NORMAL
Num Disks:                                      30
Disk Size MB:                               29,808
.  .  .
DG Total MB:                               894,240
DG Used MB:                                257,792
DG Free MB:                                636,448
.  .  .
Cell Required Mirror Free MB:              447,120
.  .  .
Disk Required Mirror Free MB:               53,600
.  .  .
Disk Failure Usable File MB:               291,424
Cell Failure Usable File MB:                94,664
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: PASS
-------------------------------------------------------------------------
DG Name:                                 RECO_EXAD
DG Type:                                    NORMAL
Num Disks:                                      36
Disk Size MB:                              375,344
.  .  .
DG Total MB:                            13,512,384
DG Used MB:                              6,339,176
DG Free MB:                              7,173,208
.  .  .
Cell Required Mirror Free MB:            6,756,192
.  .  .
Disk Required Mirror Free MB:              423,896
.  .  .
Disk Failure Usable File MB:             3,374,656
Cell Failure Usable File MB:               208,508
.  .  .
Enough Free Space to Rebalance after loss of ONE disk: PASS
Enough Free Space to Rebalance after loss of ONE cell: PASS
.  .  .
Script completed.
  
PL/SQL procedure successfully completed.

No comments:

Post a Comment