In my opinion, one of the most under-utilized features of Oracle Enterprise Manager is the Corrective Actions that can be triggered when a metric alert threshold is crossed. I think one reason it’s under-utilized is it’s very hard to think about where to start and what can be automated. My advice from previously implementing, is to look at the alerts that are generated, or tickets, and determine which ones are most frequent and mundane. The one that always comes to mind is Archive Destination.
Archive was the first CA we implemented at my previous company, because we got at least 20 per day. Since our backups were controlled by a different team, all we could do was cut a ticket to them, and possibly kick off an archive backup hoping it would complete this time. So we put this in a Corrective Action. The script checked for hung backups sessions, checked that a backup wasn’t already running, looked through a config file to get the right information, and then kicked off an archive backup job. Then it sent an email to a ticketing queue for the backup team to generate the ticket so they could investigate why it was failing. We set the CA to run on Warning, with a fairly low threshold so we had plenty of time to react if it got to Critical. This was such a success that we went on to write more, including automating tablespace adds and sending notifications to the application teams when process/sessions was exceeded.
A friend of mine, Tyler Sharp, recently started implementing Corrective Actions and has found tremendous time savings. He recently had the idea to automate oradebug steps so they would always have the required debug when working with Oracle Support, instead of having to go through the process manually the next time around. The CA is triggered when > 4 active sessions waiting on concurrency, or over 900 seconds db blocking time. He was kind enough to share the script they’ve implemented below:
conn / as sysdba
set serverout on
trace_name VARCHAR2 (1000) := NULL;
alter_session VARCHAR2 (1000) := NULL;
SELECT to_char (SYSDATE, ‘MMDDYY_HHMISS’) INTO trace_name FROM DUAL;
‘alter session set tracefile_identifier=”’||trace_name||’_AUTO_HANGANALYZE”’;
EXECUTE IMMEDIATE alter_session;
oradebug dump ashdumpseconds 5
oradebug hanganalyze 3
oradebug hanganalyze 3
As you can see, they set a trace file name, take an ashdump, then do the required hanganalyze twice with a sleep in between. Now the DBA can skip these steps when working an issue, and collect the files that were created at the right time, not 10 minutes later. You’ll need to be sure to have a credential that has sysdba access to run this properly.
The great thing about Corrective Actions, is you can use them in a template so you can push them to all servers to keep your resolutions standard. The Corrective Action is triggered for either Warning or Critical, or both. Then you have the choice to get notified of that alert right away, or bypass notifications unless the Corrective Action failed. This allows you a fall back in case the script or job has a problem fixing the issue.
To learn more about Corrective Actions, check the Oracle Enterprise Manager 12c Cloud Control Administrator’s Guide and check out the following blog posts for more ideas!
- Step by step post from Javier Ruiz – Use Corrective Actions to Automate Metric Alert Resolutions
- Philip Brown’s blog TX Row Lock Contention and Corrective Actions
What are the Corrective Actions you’ve implemented, or would like to implement in your environment?