I’ve been dealing with a problem where, sometimes, a rebooted RAC node is unable to join back the cluster. The issue seems to be with the “Master Node”, which refuses to accept the node.
So I’ve to know which is the “Master Node” (the current known solution is to reboot it, and then all nodes join the cluster).
There is the Oracle note: How to Find OCR Master Node (Doc ID 1281982.1)
And there is this blog entry: 11G R2 RAC: How to identify the master node in RAC
In my case I’m using Oracle 12.1.0.2, the location of the files is a bit different. The location of the OCR Master Node can be found on this version using one of the following ways:
- Check the crsd logs for “OCR MASTER”
grep "OCR MASTER" ${ORACLE_BASE}/diag/crs/`hostname`/crs/trace/crsd*
and, if the logs did not rotate too much yet, you should see one of the two below:
/u00/app/oracle/diag/crs/anjovm1/crs/trace/crsd_73.trc:2018-01-13 14:05:30.535186 : OCRMAS:3085: th_master:13: I AM THE NEW OCR MASTER at incar 2. Node Number 1 /u00/app/oracle/diag/crs/anjovm2/crs/trace/crsd_71.trc:2018-01-13 14:05:32.823231 : OCRMAS:3085: th_master: NEW OCR MASTER IS 1
- Check the location of the OCR automatic backups
the cluster node currently keeping the backups, is the OCR master node. If you see older backups on other nodes, it was when they were OCR master nodes on its turn.
ls -l /u00/app/12.1.0.2/grid/cdata/<cluster_name> -rw-r--r-- 1 root system 943266 Jan 14 00:01 backup00.ocr -rw-r--r-- 1 root system 943266 Jan 13 20:01 backup01.ocr -rw-r--r-- 1 root system 943266 Jan 13 16:01 backup02.ocr -rw-r--r-- 1 root system 943266 Jan 13 00:00 day.ocr -rw-r--r-- 1 root system 943266 Jan 14 00:01 day_.ocr -rw-r--r-- 1 root system 943266 Dec 31 23:55 week.ocr -rw-r--r-- 1 root system 943266 Jan 07 23:59 week_.ocr
Note: do not confuse the OCR master node with the Cluster Health Monitor repository master node, which you get using the command:
/u00/app/12.1.0.2/grid/bin/oclumon manage -get MASTER