OMS


Recover OEM repository from old schema backup

Things not always go right, and recently I had to recover the OEM repository using a one month old datapump backup. Only SYSMAN schema was corrupted, and I did not touch the other schemas.

I probably still miss doing some things and this post is mostly a note to make it faster for other people going through the process and not having a DB backup…

A full restore of the database would be certainly a much better and efficient solution than the one below. 🙂

To recover the corrupted SYSMAN schema where the OEM repository resides, first I drop it and then I did:

# Restore user
impdp schemas=SYSMAN directory=BACKUP_DIR dumpfile=expdp_sysman_april2021.dmp include=USER

# Restore system grants:
SQL> grant execute on dbms_crypto to sysman;
SQL> grant execute on dbms_lock to sysman;
SQL> grant execute on dbms_rls to sysman;
SQL> grant execute on dbms_aq to sysman;
SQL> grant execute on dbms_aqadm to sysman;
SQL> grant execute on dbms_alert to sysman;

# Restore other object grants 
impdp directory=BACKUP_DIR dumpfile=expdp_sysman_april2021.dmp include=GRANT

# Restore data
impdp schemas=SYSMAN directory=BACKUP_DIR dumpfile=expdp_sysman_april2021.dmp 

# Restart queues
/* select 'exec DBMS_AQADM.START_QUEUE(queue_name => ''SYSMAN.'||name||''');' from dba_queues where enqueue_enabled like '%NO%' and owner='SYSMAN' order by 1; */
SQL> exec dbms_aqadm.start_queue('SYSMAN.MGMT_ADMINMSG_BUS',true,true);
SQL> exec dbms_aqadm.start_queue('SYSMAN.EM_CNTR_QUEUE',true,true);
SQL> exec dbms_aqadm.start_queue('SYSMAN.EM_LOADERJOB_QUEUE',true,true);
SQL> exec dbms_aqadm.start_queue('SYSMAN.EM_EVENT_BUS',true,true);
SQL> exec dbms_aqadm.start_queue('SYSMAN.EM_NOTIFY_Q',true,true);
SQL> exec dbms_aqadm.start_queue('SYSMAN.EM_GROUP_EVENT_Q',true,true);
SQL> exec DBMS_AQADM.START_QUEUE('SYSMAN.EM_JOB_STATUS_UPD_Q');
SQL> exec DBMS_AQADM.START_QUEUE('SYSMAN.MGMT_HOST_PING_Q');
SQL> exec DBMS_AQADM.START_QUEUE('SYSMAN.MGMT_LOADER_Q');
SQL> exec DBMS_AQADM.START_QUEUE('SYSMAN.MGMT_NOTIFY_INPUT_Q');
SQL> exec DBMS_AQADM.START_QUEUE('SYSMAN.MGMT_TASK_Q');
SQL> commit;

# Perform partitionion maintenance (as SYSMAN):
SQL> exec gc_interval_partition_mgr.partition_maintenance;
SQL> exec mgmt_audit_admin.add_audit_partition;

Start OEM again:

emctl start oms

Unblock the agents (Setup – Manage Cloud Control – Agents – select blocked agents + “unblock”) and on each monitored host, clear the agent state, restart it and test:

emctl clearstate agent
emctl stop agent
emctl start agent
emctl control agent runCollection $(hostname -f):host Response
emctl status agent

After this, OEM was running again without problems.


OEM Agent error with [handshake has no peer]

At a client, there is OEM 13.4 running without problems, being patched every three months with the latest PSU.

Some old VMs have old Agents, like OEM Agent 13.2.

Recently we noticed one corrective action failed with:

Corrective action output=Step aborted after 30 failed attempts to run. Last captured error message was: Caught agent exception[unable to connect to http server at https://srv01234.domain.com:3872/emd/main/. [handshake has no peer](oracle.sysman.emSDK.agent.comm.exception.VerifyConnectionException)]
Error communicating with agent. Please also check incident console for possible job-system related problems.

The problem is that OEM Agent 13.2 uses by default the following Ciphers to communicate:

(more…)

OMS, select from dual and library cache latches (and how to diagnose)

Yesterday I went through a problem related to library cache latch contention which I was able to diagnose with the help of Arup Nanda blog
It was OMS-Enterprise Manager Grid Control who was blocking dozens of other user sessions… The problem is classified in Metalink under Bugs but described as “not a bug”: Bug 10061837: CHILD LIB CACHE LATCH HELD BY SESSION TRAVERSING V$SQL CAUSES HANG

This led to investigate if we could decrease the library cache latch, and investigation on how to avoid soft parses and how to tune the SESSION_CACHED_CURSOR parameter. Help this time come from a compilation work by Juan Carlos Reyes Pacheco on oracle-l mailing list

Another question was brought by another DBA, what is the influence of “select … from dual” into library cache latches. This time a good answer from Tom Kyte showed the huge gain of having direct assignment of variables in PLSQL instead of using or abusing of the dual table. I copy Tom Kyte’s conclusion: “Why do so many people “select …. from dual” in plsql when a simple assignment would work??”