Wednesday, December 16, 2015

ORA-00304: requested INSTANCE_NUMBER is busy

I was working on verifying that all 12c Dataguard instances were running on an Exadata X5 full rack system and I came across one instance that was not running. I noticed this when I saw the /etc/oratab entry and did not see the corresponding instance running on the database compute server.

I manually tried to bring up the instance from node 3 and encountered the error below.

node3-DR> sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 16 18:30:15 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount;
ORA-00304: requested INSTANCE_NUMBER is busy
SQL> exit
Disconnected

I then looked at the instance alert log and saw the following:

...
Wed Dec 16 18:30:22 2015
ASMB started with pid=39, OS id=59731
Starting background process MMON
Starting background process MMNL
Wed Dec 16 18:30:22 2015
MMON started with pid=40, OS id=59733
Wed Dec 16 18:30:22 2015
MMNL started with pid=41, OS id=59735
Wed Dec 16 18:30:22 2015
NOTE: ASMB registering with ASM instance as Standard client 0xffffffffffffffff (reg:1438891928) (new connection)
NOTE: ASMB connected to ASM instance +ASM3 osid: 59737 (Flex mode; client id 0xffffffffffffffff)
NOTE: initiating MARK startup
Starting background process MARK
Wed Dec 16 18:30:22 2015
MARK started with pid=42, OS id=59741
Wed Dec 16 18:30:22 2015
NOTE: MARK has subscribed
High Throughput Write functionality enabled
Wed Dec 16 18:30:34 2015
USER (ospid: 59191): terminating the instance due to error 304
Wed Dec 16 18:30:35 2015
Instance terminated by USER, pid = 59191

 The error number was 304 that was in the output from sqlplus and from the alert log as well, then the next thing I did was naturally check google and oracle support to see if I could find something that matched my scenario and could not find something exact. 

Then the next thing that came to mind was to check gv$instance to see where the Primary RAC and Standby RAC standby database were running from.

# Primary
  INST_ID INSTANCE_NUMBER INSTANCE_NAME     HOST_NAME      STATUS
---------- --------------- ---------------- ------------ ------------
         1               1     PROD1            node01        OPEN
         3               3     PROD3            node08        OPEN
         2               2     PROD2            node07        OPEN

# Standby
   INST_ID INSTANCE_NUMBER INSTANCE_NAME    HOST_NAME      STATUS
---------- --------------- ---------------- ------------ ------------
         1               1 DG1               node01        MOUNTED
         3               3 DG3               node08        MOUNTED
         2               2 DG2               node07        MOUNTED

I then realized from node 3 that the oratab entry for instance 3 was not correct it was already running from node node08!

Furthermore I also verified the RAC configuration via srvctl.

node3-DR>srvctl config database -d DG3
Database unique name: DG
Database name: PROD
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile: +RECOC1/DG/PARAMETERFILE/spfile.4297.894386377
Password file:
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATAC1,RECOC1
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: DG1,DG2,DG3
Configured nodes: node01,node07,node08
Database is administrator managed

The configured nodes for the Dataguard Standby database is supposed to be running from node01, node07 and node08 it is not configured to run on node03.

Once I confirmed this I simply removed the oratab entry to prevent any future further confusion. Moral of the story please only leave oratab entries intact for real instances that need to run from the node.