Oracle


CRS-6706: Oracle Clusterware Release patch level (‘2565072065’) does not match Software patch level (‘277169099’)

Today on a dirt and quick patching session I end up with the following error when restarting the Oracle Restart Grid Infratructure.

The complete error was:

oracle@anjovm1: [+ASM] crsctl start has
CRS-6706: Oracle Clusterware Release patch level ('2565072065') does not match Software patch level ('277169099'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

This was because I’ve installed the OJVM RU to the Grid home. The solution was simply:

oracle@anjovm1: ~oracle/ [+ASM] /u00/app/grid/18.0.0/OPatch/opatch rollback -id 28790647

And then I could restart:

oracle@anjovm1:~oracle/ [+ASM] crsctl start has
CRS-4123: Oracle High Availability Services has been started.

Index columns query

Query to show all indexes, its characteristics and respective columns from a table.
Note: It does not filter on owner.

select c.index_name,
       c.table_name||'('||listagg(column_name,',') within group (order by column_position)||')' as columns,
       decode(status,'VALID',null,status||' ')
       ||decode(visibility,'VISIBLE',null,visibility||' ')
       ||decode(segment_created,'YES',null,'EMPTY ')
       ||decode(uniqueness,'NONUNIQUE',null,uniqueness||' ')
       ||funcidx_status as STATUS
 from dba_ind_columns c, dba_indexes i
where i.table_name=c.table_name and i.index_name=c.index_name
  and c.table_name='&TABLE_NAME'
group by c.table_name,c.index_name,i.status,visibility, segment_created,uniqueness,funcidx_status
order by c.table_name,c.index_name;

Example of output:

INDEX_NAME                     COLUMNS                                 STATUS  
------------------------------ --------------------------------------- ----------
IDX_LIB_VERSION_PROFILE_ID     LIB_VERSION(PROFILE_ID)                       
INX_LIB_VERSION_SRC            LIB_VERSION(DATA_RESOURCE,SOURCE)             
LIB_VERSION_NAME_IDX           LIB_VERSION(VERSION_NAME)                     
LIB_VERSION_PK                 LIB_VERSION(ID)                         UNIQUE
LIB_VERSION_UK1                LIB_VERSION(PROFILE_ID,VERSION_NAME)    UNIQUE

Find NOSEGMENT indexes that you forgot to delete

Err is human and I, as human, I do forget sometimes to clean-up some tests.

Recently this happen with NOSEGMENT indexes. Yesterday I received an email from a colleague, reporting some errors he received with Datapump:

ORA-08114: can not alter a fake index
Failing sql is:
ALTER INDEX "SUPERAPP"."IDX_POSITION_TEST" NOPARALLEL
ORA-39083: Object type INDEX failed to create with error:
ORA-08114: can not alter a fake index

The indexes created with NOSEGMENT do not appear in DBA_INDEXES (don’t get fool by the SEGMENT_CREATED column).
They do appear on DBA_OBJECTS, but there they look like any other index.

The solution is to match the two tables. So to find the NOSEGMENT indexes that you forgot to delete you can use the following query:

SELECT 'DROP INDEX '||owner||'.'||object_name||';' cmd, created FROM dba_objects 
WHERE object_type='INDEX' 
AND object_name NOT IN (SELECT index_name FROM  dba_indexes);

If you wonder what are NOSEGMENT indexes, a good source is Oracle-Base.


Query DGMGRL silently from observer site

When the observer is “checking” several dataguard configurations, we can use this server to easily make queries on all of them, for instance using:

for db in `ps -ef | grep observer | awk -F '/' '{print $6}' | sort`; do
conn=`sed -n 1p /u00/app/oracle/admin/${db}/observer/fsfo_${db}.conf`
conndb=`echo $conn | cut -d "@" -f2`
dgmgrl -silent ${conn} "show configuration" | grep -E 'Primary'
done;

It does for every observed configuration:

  1. gets the DB name
  2. find the connection string in the config file
  3. connects to DGMGRL in silent more and runs a command

Index Logical Corruption!

Arriving to my client on Monday, I see the local DBA already with two people behind and him concerned about something. He was doing a new failover of a critical database that had problems during the weekend due to a change on a firewall. The timeline was somehow like this:

  • Friday afternoon – firewall change
  • Saturday morning – many syslog errors and almost impossible to use the DB (due to audit to OS level)
  • Saturday afternoon – failover of the database to secondary site + set syslogd to restart every 5 minutes
  • Sunday morning – reinstate old primary as standby (firewall change was fixed/rolledback)
  • Monday morning – failover of database to primary site + reinstate DB on secondary site

All looked back fine by end of the morning, we were happy that the reinstate worked fine, even though it was necessary to do it using sqlplus, as the broker was giving errors.

Monday midday the users complaint they are receiving ORA-01555 for some queries.

Mon Aug 20 11:21:35 2018
ORA-01555 caused by SQL statement below (SQL ID: cg61q9avtk3ta, Query Duration=1 sec, SCN: 0x01a9.3ae6da9e):
delete from child_table where parent_id = :parent and child_id in (:child1)

We check alertlog and see the ORA-01555 on two queries filtering on the same key-value pair. Probably a wrong plan was generated and undo exploded, I think. But the undo tablespace is not even half used. And query duration is 1 second, on the alertlog message.

This is strange. Also more strange is that few time after we also get ORA-00600 on the alertlog:

Mon Aug 20 16:16:54 2018
Errors in file /u00/app/oracle/diag/rdbms/db1_site1/DB1/trace/DB1_ora_13217.trc  (incident=1543593):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u00/app/oracle/diag/rdbms/db1_site1/DB1/incident/incdir_1543593/DB1_ora_13217_i1543593.trc

This is case for Metalink. And there we find the reason:

ALERT Bug 22241601 ORA-600 [kdsgrp1] / ORA-1555 / ORA-600 [ktbdchk1: bad dscn] / ORA-600 [2663] due to Invalid Commit SCN in INDEX (Doc ID 1608167.1)

Index corruption, nice. It could be worse, the workaround is to recreate the problematic indexes.

Ok, then we should run DB Verify to validate all indexes. There is this new option to run DBV on specific segments, which is great, and DBV was even enhanced to find this specific corruption:

Bug 7517208 – DBV enhanced to identify Logical SCN Block corruptions (Doc ID 7517208.8)

But trying to do on datafiles on ASM is not easy, does not seem to work… until we discover that:

Bug 18353157 – DBVERIFY (DBV) does not show SCN mismatch for datafiles in ASM (Doc ID 18353157.8)

Next step is to try to do with a simple ANALYZE INDEX. After some minutes the alert log starts showing a lot of:

Tue Aug 21 10:36:43 2018
ORA-01555 caused by SQL statement below (SQL ID: 6c3gh913k55pr, SCN: 0x01a9.3b4ea9e1):
ANALYZE INDEX USER1.IX_ABC VALIDATE STRUCTURE ONLINE
ORA-01555 caused by SQL statement below (SQL ID: 6c3gh913k55pr, SCN: 0x01a9.3b4ea9ee):
ANALYZE INDEX USER1.IX_ABC VALIDATE STRUCTURE ONLINE
…

Unfortunately the errors do not appear on sqlplus, so we need to have a tail -f on the alertlog, keep the eyes open and then cancel the ANALYZE INDEX.

As there seemed to be corruption to several indexes of USER1, we decided that it was just easier to directly rebuild all of them. The index rebuild need to be ONLINE. Not because we can’t support having locks on the tables of the application, but because only the ONLINE rebuild will do a full table scan to recreate the index. A normal rebuild uses the index itself, which is not good when it has some corruption.

One mistake we did at some point was to ANALYZE the primary key of one table and start an INDEX REBUILD ONLINE of an index on the same table. The ANALYZE and INDEX REBUILD, were locking each other (or advancing very slowly). So it is better not to perform both at the same time.

When we tried to rebuild online the primary key, we got the error:

ORA-08108: may not build or rebuild this type of index online

First we tought it would be due the fact it is a Primary Key, but soon we tumbled upon Jonnathan Lewis blog entry about this: https://jonathanlewis.wordpress.com/2012/09/04/online-rebuild-2/

Our Primary key is DEFERRABLE! Oups! The only option is now to change or deactivate the constraint, rebuild the index and go on. But this is not possible with the application running. So it implies an emergency change and a lot of actions. Here the technical summary on this Oracle Restart environment:

-- Stop service:
$ srvctl stop service -d db1_site1 -s db1_service_rw
$ srvctl status service -d db1_site1

-- Kill sessions using service and/or from users:
set pages 0
spool /tmp/to_kill.sql
select 'alter system disconnect session '''||sid||','||serial#||''' immediate;' from v$session where service_name='DB1_SERVICE_RW';
spool off

$ vi /tmp/to_kill.sql

SQL> @/tmp/to_kill.sql

-- Confirm no other sessions:
SQL> select username, program,machine from v$session where service_name='DB1_SERVICE_RW';

-- Recreate index:
SET TIME ON
SET TIMING ON
ALTER SYSTEM SET job_queue_processes=0;
ALTER SESSION ENABLE PARALLEL DDL;
ALTER TABLE user1.child_table DISABLE CONSTRAINT pk_child_table KEEP INDEX;
ALTER INDEX user1.pk_child_table REBUILD ONLINE PARALLEL 12;
ALTER TABLE user1.child_table ENABLE CONSTRAINT pk_child_table;
ALTER INDEX user1.pk_child_table NOPARALLEL;
ALTER SYSTEM SET job_queue_processes=10;

-- Restart service:
$ srvctl start service -d db1_site1 -s db1_service_rw
$ srvctl status service -d db1_site1

After all this finally we hoped that our database was back to normal. Application reconnects, we cross our fingers and since then no more errors. We are happy that this was just a light corruption.


Oracle 12.1 – Find OCR Master Node

I’ve been dealing with a problem where, sometimes, a rebooted RAC node is unable to join back the cluster. The issue seems to be with the “Master Node”, which refuses to accept the node.

So I’ve to know which is the “Master Node” (the current known solution is to reboot it, and then all nodes join the cluster).

There is the Oracle note: How to Find OCR Master Node (Doc ID 1281982.1)

And there is this blog entry: 11G R2 RAC: How to identify the master node in RAC

In my case I’m using Oracle 12.1.0.2, the location of the files is a bit different. The location of the OCR Master Node can be found on this version using one of the following ways:

  • Check the crsd logs for “OCR MASTER”
grep "OCR MASTER" ${ORACLE_BASE}/diag/crs/`hostname`/crs/trace/crsd*

and, if the logs did not rotate too much yet, you should see one of the two below:

/u00/app/oracle/diag/crs/anjovm1/crs/trace/crsd_73.trc:2018-01-13 14:05:30.535186 :  OCRMAS:3085: th_master:13: I AM THE NEW OCR MASTER at incar 2. Node Number 1

/u00/app/oracle/diag/crs/anjovm2/crs/trace/crsd_71.trc:2018-01-13 14:05:32.823231 :  OCRMAS:3085: th_master: NEW OCR MASTER IS 1
  • Check the location of the OCR automatic backups

the cluster node currently keeping the backups, is the OCR master node. If you see older backups on other nodes, it was when they were OCR master nodes on its turn.

ls -l /u00/app/12.1.0.2/grid/cdata/<cluster_name>
-rw-r--r--    1 root     system      943266 Jan  14 00:01 backup00.ocr
-rw-r--r--    1 root     system      943266 Jan 13 20:01 backup01.ocr
-rw-r--r--    1 root     system      943266 Jan 13 16:01 backup02.ocr
-rw-r--r--    1 root     system      943266 Jan 13 00:00 day.ocr
-rw-r--r--    1 root     system      943266 Jan  14 00:01 day_.ocr
-rw-r--r--    1 root     system      943266  Dec 31 23:55 week.ocr
-rw-r--r--    1 root     system      943266 Jan 07 23:59 week_.ocr

Note: do not confuse the OCR master node with the Cluster Health Monitor repository master node, which you get using the command:

/u00/app/12.1.0.2/grid/bin/oclumon manage -get MASTER

 


Check Advanced Compression option usage on Oracle database

To check if the “advanced compression” Oracle database option was used on your environment is not so easy, as there are several activities that can activate this option.

For example for a mix Oracle 11.2.0.4 and 12.1.0.2 environment we can use the following query:

select * from DBA_FEATURE_USAGE_STATISTICS
where ((name like '%Compress%' and name not in ('Backup BASIC Compression','Backup BZIP2 Compression','SecureFile Compression (system)')) 
  and detected_usages>0)
or (name='Data Guard' and FEATURE_INFO like '%Compression used: TRUE%')
or (name in ('Oracle Utility Metadata API','Oracle Utility Datapump (Export)','Oracle Utility Datapump (Import)') 
  and version like '11.2.%' and feature_info not like '%compression used: 0 times%')
or (name in ('Oracle Utility Metadata API','Oracle Utility Datapump (Import)') 
  and version like '12.%' and feature_info like '%compression%')
or (name in ('Oracle Utility Datapump (Export)') 
  and version like '12.%' and feature_info not like '%compression used: 0 times%');

Why do we need so many filters? Because the text that shows if compression was used changes depending on the utility and Oracle version!
Below 3 different examples when compression was not used:

Version Utilities FEATURE_INFO text
12.1 Oracle Utility Datapump (Export) invoked: 1 times, compression used: 0 times (BASIC algorithm used: 0 times, LOW algorithm used: 0 times, MEDIUM algorithm used: 0 times, HIGH algorithm used: 0 times), encryption used: 0 times (AES128 algorithm used: 0 times, AES192 algorithm used: 0 times, AES256 algorithm used: 0 times, PASSWORD mode used: 0 times, DUAL mode used: 0 times, TRANSPARENT mode used: 0 times), parallel used: 0 times, full transportable used: 0 times
12.1 Oracle Utility Datapump (Import)

Oracle Utility Metadata API

invoked: 1 times, parallel used: 0 times, full transportable used: 0 times
11.2 Oracle Utility Metadata API invoked: 1 times, compression used: 0 times, encryption used: 0 times

Also, seems that for “Data Guard” utility the text has capital letters “%Compression used: TRUE%”.

More information about which features enable the usage of Advanced Compression option can be find here.

And how to avoid the usage of the Advanced Compression option by error on Mathias Zarick blog.


Back to the basics: correct parameter from SPFILE in ASM

When you change the size of memory and forget to update SGA_MAX_SIZE it might happen:

oracle@anjo01:/home/oracle [] srvctl start database -d p05ltm02
PRCR-1079 : Failed to start resource ora.p05ltm02.db
CRS-5017: The resource action "ora.p05ltm02.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-00823: Specified value of sga_target greater than sga_max_size

As this is a clustered database with ASM, here the easiest way to solve:

1. Find the full path of the SPFILE on ASM

oracle@anjo01:/home/oracle [p05ltm02_1] srvctl config database -d p05ltm02 -a
Database unique name: p05ltm02
Database name: ltm02
Oracle home: /oracle/app/product/11.2.0.4
Oracle user: oracle
Spfile: +DGDATA/p05ltm02/spfileltm02.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: sp_ha01
Database instances:
Disk Groups: DGDATA,DGFRA
Mount point paths:
Services: p05ltm02_app.anjo01
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: p05ltm02
Candidate servers: anjo01,anjo02
Database is enabled
Database is policy managed

2. Create a PFILE from SPFILE on the local filesystem

oracle@anjo01:/home/oracle [p05ltm02_1] sqh
SQL*Plus: Release 11.2.0.4.0 Production on Tue Feb 2 09:33:47 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> create pfile='/tmp/pfile.txt' from spfile='+DGDATA/p05ltm02/spfileltm02.ora';
File created.

SQL> exit
Disconnected

3. Correct the damn parameter

oracle@anjo01:/home/oracle [p05ltm02_1] vi /tmp/pfile.txt

4. Recreate the SPFILE on ASM:

oracle@anjo01:/home/oracle [p05ltm02_1] sqh
SQL*Plus: Release 11.2.0.4.0 Production on Tue Feb 2 09:34:44 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> create spfile='+DGDATA/p05ltm02/spfileltm02.ora' from pfile='/tmp/pfile.txt';
File created.

SQL> exit
Disconnected

5. And finally restart the database:

oracle@anjo01:/home/oracle [p05ltm02_1] srvctl start database -d p05ltm02

CloneDB: terminating the instance due to error 17514

While getting adventureous and trying to enable clonedb on a RAC database running on ASM (set clonedb parameter=true and restart the database), I got this error on alert log:

Mon Nov 16 11:11:40 2015
CloneDB: created bitmap file RAC_bitmap.dbf
USER (ospid: 15767): terminating the instance due to error 17514

RAC is the name of the database. The <SID>_bitmap.dbf file was correctly created on $ORACLE_HOME/dbs

Then it just crash the instance. Google and Metalink did not yet indexed this error. But here its description:


$ oerr ora 17514
17514, 00000, "Access to clonedb bitmap file failed"
// *Cause: Accessing the bitmap block file in clonedb environment has
// encountered an error
// *Action: check additional error messages in the trace file and call
// Oracle Support Services

The trace files did not help much. I did not investigate the problem further.


Thin PDB clones on ext4 filesystem with Oracle 12.1.0.2

Oracle 12.1.0.2 brings a great new functionality: you can create thin clones – that is, not copying the source files, just pointers – on ext4 filesystem, the current default on Red Hat Linux 6.

Requirements for example below:
– Oracle 12.1.0.2
– Data files are on ext4 filesystem (or other supporting sparse files)
– Source PDB on read only mode

It is very simple to set up:

SQL> alter system set clonedb=TRUE scope=spfile;
SQL> shutdown immediate
SQL> startup

SQL> show parameter clonedb
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
clonedb                              boolean     TRUE

I will create a clone of the PDB2 database.

SQL> select name,con_id,open_mode,SNAPSHOT_PARENT_CON_ID from v$pdbs;

NAME                               CON_ID OPEN_MODE  SNAPSHOT_PARENT_CON_ID
------------------------------ ---------- ---------- ----------------------
PDB$SEED                                2 READ ONLY                       0
PDB1                                    3 MOUNTED                         0
PDB2                                    4 READ ONLY                       0

For this I need first to create the target directory for the datafiles.

SQL> ! mkdir -p /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1

Then I can do the clone using the ‘snapshot copy’ clause. It took 9 seconds.

SQL> create pluggable database PDB2_C1 from PDB2 snapshot copy file_name_convert=('/u01/app/oracle/oradata/NONRAC/PDB2','/u01/app/oracle/oradata/NONRAC/PDB2_CLONE1');
Pluggable database created.

Elapsed: 00:00:09.35

Note that the new PDB remains in MOUNT state and without showing a connection to its parent. I find unfortunate the lack of that information.

SQL> select name,con_id,open_mode,SNAPSHOT_PARENT_CON_ID from v$pdbs;

NAME                               CON_ID OPEN_MODE  SNAPSHOT_PARENT_CON_ID
------------------------------ ---------- ---------- ----------------------
PDB$SEED                                2 READ ONLY                       0
PDB1                                    3 MOUNTED                         0
PDB2                                    4 READ ONLY                       0
PDB2_C1                                 5 MOUNTED                         0

After we open the clone in READ WRITE mode, then we can see the relationship with the parent.

SQL> alter pluggable database PDB2_C1 open read write;
Pluggable database altered.
Elapsed: 00:00:02.12

SQL> select name,con_id,open_mode,SNAPSHOT_PARENT_CON_ID from v$pdbs;

NAME                               CON_ID OPEN_MODE  SNAPSHOT_PARENT_CON_ID
------------------------------ ---------- ---------- ----------------------
PDB$SEED                                2 READ ONLY                       0
PDB1                                    3 MOUNTED                         0
PDB2                                    4 READ ONLY                       0
PDB2_C1                                 5 READ WRITE                      4

And below we verify that the clone uses 4MB instead of 800MB of the source database.

SQL> ! du -m /u01/app/oracle/oradata/NONRAC/PDB2/*
6       /u01/app/oracle/oradata/NONRAC/PDB2/PDB2_users01.dbf
586     /u01/app/oracle/oradata/NONRAC/PDB2/sysaux01.dbf
261     /u01/app/oracle/oradata/NONRAC/PDB2/system01.dbf
1       /u01/app/oracle/oradata/NONRAC/PDB2/temp01.dbf

SQL> ! du -m /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1/*
1       /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1/PDB2_users01.dbf
1       /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1/sysaux01.dbf
1       /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1/system01.dbf
1       /u01/app/oracle/oradata/NONRAC/PDB2_CLONE1/temp01.dbf

We can also clone non-CDB databases with some more steps, as explained at:
https://dbamarco.wordpress.com/2015/03/09/small-change-big-difference/

The clonedb.pl script (see documentation here) can also be used, even though you might find easier to script yourself to your own environment.