SMS Notifications From Enterprise Manager

Starting with Enterprise Manager (EM) 12c, you have the choice to send SMS notifications to a cell phone, or a pager (does anybody still use pagers?).  There’s been a couple questions on the forums about this so I thought I’d write it up since it appears to be a bit confusing as to how this works.

First, be sure that your Mail Server is setup in Setup / Notifications / Notification Methods and you can receive the test e-mail.


Next, create an EM administrator user, then login as that user to update your e-mail address and SMS/Pager information by clicking on the Username drop down menu and selecting Enterprise Manager Password & Email.


For the SMS/Pager, you need the text based address.  So if your provider was Verizon, it would be    Select the Email Type Pager for SMS messages as they are shorter than the Email format.   It’s important to note, that you will not see the multiple lines in the Setup / Security / Administrators view.  You can enter multiple e-mails separated by commas, but the E-mail Type option will not be available.


By default, both email and pager will be enabled in your Notification Schedule, you may adjust this as necessary by going to Setup / Notifications / My Notification Schedule.   You can receive notifications by both e-mail/page, just e-mail, or just page depending on what you configure here.

Next, you need to create an Incident Rule Set, or edit an existing one.   From Setup / Incidents / Incident Rules, select a rule set and rule to edit.  Once you get to the Action for the Rule, in the Basic Notifications section, select the EM Administrator in the Page box.  Save all changes.


In 12cR4, you can test the Incident Rule Set by selecting your Rule Set and clicking the Simulate Rules button (Setup / Incidents /  Incident Rules).  You will need to select a Target, Event Type and find an alert to simulate.  Then you will get a list of Actions that the Incident Rule Set will perform for this alert.

To test my notifications, I dropped the warning threshold on the tablespace metric to 15, something I know would trigger immediately.    Here’s the messages I received on my phone.


I also received the long format in my e-mail.  When you’re done testing, don’t forget to set your thresholds back!


Introducing Oracle Management Cloud…

Oracle Management Cloud

If you were at Oracle OpenWorld this year you might have had the chance to preview Oracle’s latest cloud service – Oracle Management Cloud (OMC).   The three OMC services launched this month are Application Performance Monitoring (APM), Log Analytics (LA) and IT Analytics (ITA).    The goal for OMC is to bring together different types of operational data for use by both businesses and IT.  All data is stored in a unified data platform that allows you to navigate from one service to another while working on specific use cases.   Here’s a quick overview of each service to get you up to speed.

Application Performance Monitoring

This service gives you insight into the end user performance and experience, from  browser statistics to  AJAX calls.  With statistics on page load times and errors, you can drill down to find errors, see the request calls, and review memory usage and garbage collection.


The integration with Log Analytics allows the user to drill down to server and application logs related to the poor performance time periods.  By using saved searches and creating custom dashboards you can see the information that’s important to your application in one view.


Log Analytics

Upload all your logs to the cloud – database, application, middleware, server, infrastructure – then search and explore to identify problems or resolve issues.   Troubleshoot problems by exploring logs in context of the application using topology-aware log exploration. Then utilize the cluster feature to identify outliers or frequently occurring patterns.  There’s always a lot of noise in log files, so this allows you to filter out the noise and get to the real errors that you need to see.


Another key feature is the ability to correlate other logs within a time period.  Let’s say you found an error at 1:00, you can then see the log entries 1 minute before and after on different systems to identify any correlated log entries.    LA also includes the ability to save queries and use them in a dashboard so you can replicate your searches with ease.  Not to mention storing logs in the Oracle cloud instead of on your servers for long periods of time.

IT Analytics

Finally a way to look at IT resources holistically and see how targets in your environment compare to one another.  Resource Analytics will allow you to view current utilization as well as forecast things like storage and CPU across targets, or groups of targets. Answer questions like how much storage will we need in 6 months?  How about what databases are consuming the most CPU and how much have they increased recently?   itacpu1

With Performance Analytics you can identify bottlenecks across Database or Middleware targets.  Have you ever wondered what the worst SQL in your environment is?  I had a manager once who really liked to point out the worst performing SQLs across applications, how easy this will be now!

Probably the most exciting feature is the Data Explorer which allows you to create custom queries and save them as a dashboard that you can create for your unique requirements.   Below I’ve searched for database instances based on their DB Wait time, hovering over the chart you get a popup with the target, type of wait and value.  I can now save this widget to a dashboard if I like.itawaits

Learn More

More information on OMC can be found in this ebook from Oracle and on the Oracle Cloud website.   You can also watch the launch video here.   I will be posting more about each of the services over the next few weeks, as well as sharing my experiences with our first customer implementations, so stay tuned!

Automating the Mundane with Corrective Actions and Oracle Enterprise Manager

In my opinion, one of the most under-utilized features of Oracle Enterprise Manager is the Corrective Actions that can be triggered when a metric alert threshold is crossed.  I think one reason it’s under-utilized is it’s very hard to think about where to start and what can be automated.  My advice from previously implementing, is to look at the alerts that are generated, or tickets, and determine which ones are most frequent and mundane.  The one that always comes to mind is Archive Destination.

Archive was the first CA we implemented at my previous company, because we got at least 20 per day.  Since our backups were controlled by a different team, all we could do was cut a ticket to them, and possibly kick off an archive backup hoping it would complete this time.  So we put this in a Corrective Action.  The script checked for hung backups sessions, checked that a backup wasn’t already running, looked through a config file to get the right information, and then kicked off an archive backup job.   Then it sent an email to a ticketing queue for the backup team to generate the ticket so they could investigate why it was failing.    We set the CA to run on Warning, with a fairly low threshold so we had plenty of time to react if it got to Critical.  This was such a success that we went on to write more, including automating tablespace adds and sending notifications to the application teams when process/sessions was exceeded.

A friend of mine, Tyler Sharp, recently started implementing Corrective Actions and has found tremendous time savings.    He recently had the idea to automate oradebug steps so they would always have the required debug when working with Oracle Support, instead of having to go through the process manually the next time around.   The CA is triggered when > 4 active sessions waiting on concurrency, or over 900 seconds db blocking time.  He was kind enough to share the script they’ve implemented below:

conn / as sysdba
set serverout on

   trace_name   VARCHAR2 (1000) := NULL;
   alter_session   VARCHAR2 (1000) := NULL;

   alter_session :=
         ‘alter session set tracefile_identifier=”’||trace_name||’_AUTO_HANGANALYZE”’;
   DBMS_OUTPUT.PUT_LINE (alter_session);
   EXECUTE IMMEDIATE alter_session;

oradebug setmypid;
oradebug unlimit;
oradebug dump ashdumpseconds 5
oradebug hanganalyze 3
execute sys.dbms_lock.sleep(180);
oradebug hanganalyze 3

As you can see, they set a trace file name, take an ashdump, then do the required hanganalyze twice with a  sleep in between.   Now the DBA can skip these steps when working an issue, and collect the files that were created at the right time, not 10 minutes later.   You’ll need to be sure to have a credential that has sysdba access to run this properly.

The great thing about Corrective Actions, is you can use them in a template so you can push them to all servers to keep your resolutions standard.    The Corrective Action is triggered for either Warning or Critical, or both.  Then you have the choice to get notified of that alert right away, or bypass notifications unless the Corrective Action failed.   This allows you a fall back in case the script or job has a problem fixing the issue.

To learn more about Corrective Actions, check the Oracle Enterprise Manager 12c Cloud Control Administrator’s Guide and check out the following blog posts for more ideas!

What are the Corrective Actions you’ve implemented, or would like to implement in your environment?

Enterprise Manager at Oracle OpenWorld 2015

You might have heard by now, there’s this little meeting coming up in 9 days called Oracle Open World.  If you happen to be heading to San Francisco, drop by and see a few friends, or 50,000.

Be sure to register for my session!  This year will be a little different as I’ll be running a panel with 4 Enterprise Manager Champions!

Using Oracle Enterprise Manager Effectively to Become an Oracle Enterprise Manager Champion [CON9711]

Joseph Kopilash, Director, Database Administration, Epsilon Data Management LLC
Steve Meredith, Boeing Oracle Enterprise Manager Service Manager, Boeing
Tyler Sharp, Technology Architect, Cerner
Eric Siglin, Senior Database Administrator, Electric Reliability Council of Texas

In this panel discussion, customers share their experiences with Oracle Enterprise Manager Cloud Control and discuss how they’ve implemented features leading to significant cost savings and operational efficiency. Get started with deployment and management of agents and establishing initial thresholds for alerting. Simplify administration of users with active directory integration. Maintain security standards and increased productivity by patching fleets of databases and stay current with critical patch updates. Learn from customers how they’ve made use of out-of-the-box and custom reports to manage engineered systems, database, and middleware targets by exception. Benefit from these top customer dos and don’ts.

Wednesday, Oct 28, 11:00 a.m. | Moscone South—104

When not presenting or meeting with customers, you’ll find me at the Engineered Systems Showcase or Hybrid Cloud Management: Single Pane of Glass for Complete Management—On-Premises and Public Cloud (SLD-026) booths.  Be sure to stop by and say hi!

For the full list of EM related sessions, you can refer to the Focus On docs.    Here’s some highlights of other sessions you’ll find interesting!

For those of you who arrive on on Saturday, or early Sunday… and aren’t drawn away by the beautiful city of San Francisco, check out the IOUG SIG Sunday sessions that are going on all day Sunday!   Watch out, they’re no longer restricted to Moscone West rooms!  Here’s a few I’m going to try to get to!

8amAlfredo Krieg delivers Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Manager 12c [UGF10288] in Moscone South 270

10amErik Benner/Wassim Kayrala deliver Database Cloud in a Box—DBaaS on Oracle Database Appliance [UGF10279] in Moscone South 270

12pmRene Antunez delivers Private Cloud Provisioning Using Oracle Enterprise Manager 12c [UGF9959] in Moscone South 305

1:30pmRay Smith delivers You’ve Got It—Flaunt It: Oracle Enterprise Manager Extensibility [UGF9930] in Moscone South 305


12:15pmOracle Enterprise Manager: The Complete Solution and Oracle’s Best-Kept Secrets [CON9715] in Moscone South 300

Hear from Amit Ganesh, VP of Development for EM.

1:30pm – Managing at Hyper-Scale: Oracle Enterprise Manager as the Nerve Center of Oracle Cloud [CON9710] in Moscone South 300

Hear about how the Oracle Cloud relies on Oracle Enterprise Manager, and learn best practices for any implementation!

5-6pm – Join your fellow EM enthusiasts in the OTN Lounge (Moscone South) for the IOUG EM SIG.   Don’t worry if you’re not a member yet, stop by and get to know some of the folks you see on Twitter or hear speaking!


11amGeneral Session: Oracle Management Cloud—Real-Time Monitoring, Log, and IT Operations Analytics [GEN9778] in Moscone South 102

Definitely a must attend event!  Learn about Oracle’s newest offering Oracle Management Cloud.


After my session at 11am, there’s a lot going on at 12:15pm but if you’re interested in monitoring, I’d recommend  Way Beyond the Basics: Oracle Enterprise Manager Monitoring Best Practices [CON9721] in Moscone South 300.


Standardize Target Monitoring with Templates

Enterprise Manager is a critical tool for monitoring database and middleware targets, as well as Engineered Systems and hosts.  Each target has it’s own set of metrics. If you read my previous posts on viewing metrics and setting thresholds, you’ve got a good understanding of how to set thresholds on a single target.  What if you have 100 targets?  Or a 1000?   Your targets for production may even have different thresholds then non-production.   Do you really want to manually set these metrics up on all targets?   Not likely.   If you have more than 3 databases or targets, you should probably consider standardizing your monitoring by using Monitoring Templates.   Templates allow you to  reuse the metrics you’ve defined for like targets.

From Enterprise menu, select Monitoring / Monitoring Templates.


You can see in the search box, you can display Oracle Certified templates. temp2

If you check this, you’ll find a long list of templates for various midddleware and application situations.


Create Template from Target

The first method to create a template is based on an existing target.  This allows you to configure your monitoring on one sample target, and copy this to a template.

Click Create.    Notice the copy monitoring settings from Target is selected.


Click the search icon to find the sample target you want to copy metrics from and click Select.temp5

First we need to give our template a name.  If you’re going to have multiple templates, it’s best to give them a detailed name to make them distinct and easily identified.     Notice the Default Template checkbox – if you check this, this template will be automatically applied to all new (not existing) Cluster Database targets as they are discovered in Enterprise Manager.  Only one default template per target type can be identified.


Click on Metric Thresholds and you will see a familiar screen with the target metrics and Warning and Critical thresholds.


If there’s additional metrics you want to add, or maybe remove from this template, click the Remove or Add metrics button.


When adding metrics, you’ll be able to search for another target, template or metric extension that you wish to add to this template.


When you’ve made your adjustments, click on the OK button to save your template.  You’ll get a confirmation when your template is created. temp10

Create from Target Type

From Monitoring Templates, click Create, this time select the option for Target Type.  This option will pull the default registered metrics for that particular target type.


Next you’ll select a category and the target type.  For Database, we will select Database Instance.    From here, the process is the same.  This template will have all default recommended metrics and you can make your adjustments from here.


Apply Templates

Now that you have a new template, you can select this template and click Apply to apply to any existing targets.    temp11

The Apply Options are important to consider.  By default, templates override only metrics common to template and target. This means if there’s a metric on the target, that is not included in the template, it is not removed or replaced.  If the metric has different thresholds or no thresholds, then it is updated to match the template.  The top option, to completely replace settings on the target will make the target identical to the target.   Which means if there are metrics not in the template, the apply will remove thresholds for those metrics and no longer alert.


The Key Values section tells the template apply how you want to handle those metrics such as Tablespace that might have multiple key values, say different thresholds for SYSTEM and SYSAUX tablespaces.


Click Add to select the targets or group you would like to apply the template to, and click Select.  Then click OK to submit the Apply job.



You can view the apply status from the Past Apply Operations button and get information on succeeded and failed operations.

So now you can take some time up front, standardize your metrics, and enforce them with templates.


Hands on Monitoring Exercises with Enterprise Manager

Dive deeper into the areas that interest you!   All steps can be done on your lab box or on your own Enterprise Manager system.

View Data with All Metrics

Modifying Metrics and Collections

Create a Template

Create a Metric Extension to notify on expiring DBSNMP accounts

Create a Metric Extension for Fast Recovery Area

Create a Repository-Side Metric Extension

Filter out a specific alert from incident rules

Managing Metric Thresholds in Enterprise Manager

One of the most critical steps in monitoring your targets with Enterprise Manager, is to set your metrics and thresholds properly for your environment.   All targets will have predefined metrics that will be enabled and thresholds set based on recommendations from Oracle product teams.    These may or may not be good for your environment.    Customers all have different requirements for what they want to be e-mailed, paged or notified by ticket about.

The most common metrics for databases are going to be the ones that cause service outages:  availability, space issues, archiver issues, data guard gaps, critical ORA- errors.   Some things, you just don’t need to know about at 2am though, things like global cache blocks lost.

From the target menu, select Monitoring / Metric and Collection Settings.  This will show you the current settings of your target.  Notice the default view is Metrics with Thresholds.  Other items are collected and can be seeing in the All Metrics view.


Let’s take a closer look at what we see here.  First we have the metric grouping or category.  Then for each metric in the group, you’ll have the operator, warning and critical thresholds.  These are the most important.  If you don’t provide a value, alerts will not be triggered as there will be no threshold violations.  The next column displays if a corrective action job has been registered on this metric. Followed by the collection schedule and Edit icon.



Clicking on the link in Collection Schedule will bring you to the collection settings.  You can enable or disable a metric collection, change the frequency, and determine whether alert only or historical trending data will be saved.   If you select alert only, it will only store occurrences where thresholds are violated.  Pay careful attention to the Affected Metrics section, as some metrics are collected in a group, and modifying these settings will affect all metrics in that group.


Returning to the main screen, click on the pencil icon to edit the metric.


This first section is where you can add a Corrective Action job if you want to automatically fix your alerts.  An example would be kicking off a RMAN archive log backup job when Archive Area Used % event is triggered.


In the Advanced Threshold section, you can determine how many times a threshold must be exceeded in a row to trigger an alert.  So if you want to alert if CPU is 95% for over 3 collections (15 minutes), then you would set Number of Occurrences to 3.


Template override allows an administrator to prevent a particular metric from being changed when templates are applied.  You want to avoid this as a common practice and reserve for special exceptions.




The Threshold Suggestion section allows you to evaluate what warning and critical severity alerts  would be generated if you changed thresholds.  You can look at the last month of collected metrics to make the best threshold estimates.  metric11

If your metric has multiple keys, you will have an additional screen where you can add additional keys.  A key would be a filesystem, or a tablespace that you want to monitor with different thresholds then the rest.


Whey you’re finished making changes, clicking Continue and OK to save metric changes to the repository and push out to the Agent.   Once you get a target set up for monitoring the way you want, you can create a template to push the same settings to all like targets.   I’ll cover this in another post soon!

Getting to Know Your Target with All Metrics View

Every target in Enterprise Manager has a set of target related metrics.   These metrics control what is collected, how frequently, and whether alerts and notifications are sent.   They are defined by target metadata and are specific to a particular target type.  The metric is collected by the Agent on regular intervals, and then batch uploaded to the EM repository.   Exploring these collected metrics can provide you with a wealth of information about your target.

From the target, click the target menu / Monitoring / All Metrics.


In this view you will get all possible metrics for this target.   You’ll also see a list of the Open Metric Events (a metric that has crossed a threshold), and the top 5 events over the last 7 days.


If you click on a metric category on the left, you’ll get the real-time values of those metrics.   The Last Upload is telling you when these metrics were last collected and uploaded to the repository.


To see those values, expand the category by clicking on the viewmetric4and selecting a specific metric, in this example Tablespace Space Used %.


This view is now showing you the last collection, by tablespace with average, low, high and last known values.   You will see the severity is clear for all tablespaces at this time.  If you have an open event, you may see a warning or critical icon here.    When you select an individual tablespace, a chart will appear in the lower half of the screen.


In this lower section, you can do a variety of actions.  At the top you’ll see a summary of the metric data, as well as the option to Modify Thresholds.  Thresholds saved will be sent out to the agent for changes.


If you want to see the metrics in table view to see the exact values and timestamps over the last several days, click the Table View link.


Under Options, you can also export this metric data to a CSV file.   Or maybe you want to see related metrics or problem analysis to identify what might have caused an issue with this metric.


When viewing Related Metrics, the predefined related metrics will be displayed, but you can add your own from any targets.


Additionally, you can compare to other keys, which would be other tablespaces in this example.  Or you can compare to other targets, say if you wanted to compare CPU utilization on 2 hosts.


By default, the data is show for a 24 hour period.  Options to view 7 days, 31 days, and custom time periods are also available.


There’s a wealth of information collected and stored, and the best place to start looking at it is in the All Metrics view.  This can help you identify collection category, additional metrics you might be interested in, and patters and trends on alerts.


Getting to Know EMDIAG: repvfy execute optimize

In my group, we work with a lot of customers with very large EM environments.  On the range of 2000+ agents.  So as you can imagine there’s a little bit of optimizing that needs to get done to account for these numbers.

A few of these standard tweaks have been put into the repvfy execute optimize command.    You can make all these changes individually, but if you want to get them all done at once, optimize is your tool.

There’s 3 categories of optimization that is handled at this point:  Internal Tasks, Repository Settings and Target system.    The script will first evaluate the size of your repository based on the number of agents, and from there determine what optimizations need to be done or recommended for future implementation.

Internal Task Tuning

Enterprise Manager uses short and long workers, depending on the task activity.  We typically recommend 2 workers for each for most larger systems, so in repvfy execute optimize this is what gets set. Smaller systems are usually sufficient with the default settings of 1 each.    You can view the configuration in EM on Manage Cloud Control -> Repository page.   Here you can also configure the short workers, but not the long.  If you see a high collection backlog, this is an indication that your in need of additional task workers.


The next step is to evaluate the current settings of the job system and ensure that there are enough connections available for the job system.  This change is not implemented automatically, but is printed out for you to change with emctl, as it will require a restart to take effect.   Recommendations for Large Job System Load can be found in the Sizing chapter of Advanced Installation Guide.  Increasing the number of connections may require an increase in database processes value.

Repository Settings Tuning

EM tracks system errors in one of it’s tables.   In larger systems, the MGMT_SYSTEM_ERROR_LOG table can become quite large over the 31 day default retention.   The optimize script reduces log retention to 7 days for normal operating.

There are also various levels of tracing enabled by default, this can generate a lot of extra activity during normal operations if you’re not utilizing the traces.    Tracing is turned off by the optimize command.  It can be enabled at any time by using the repvfy send start_trace -name <name>  and repvfy send start_repotrace commands.

Finally this step looks for any invalid SYSMAN objects and validates them, then checks for stale optimizer statistics and makes a recommendation as needed.

System Tuning

After an EM outage or downtime, all the agents will attempt to upload and update their status (or heartbeat) with the OMS.  There’s a grace period in which no alerts are sent.  In larger systems, this grace period may not be long enough to get all agents updated before alerts start going out.   This can be adjusted by increasing that grace period.

In and higher, you can also increase the number of threads that perform the ping heartbeat tasks.  This should be done if you have more than 2000 agents per OMS.  The optimize command will make this calculation for you and recommend the appropriate emctl command to set the heartbeatPingRecorderThreads property.  Recommendations for Large Number of Agents can be found in the Sizing chapter of Advanced Installation Guide.

The optimize command will only output those items that require attention, so not every item will appear in the output on every site.
The recommended values reported in the output are specific for THAT environment  and should not be copied over to another environment just like that.  To tune another EM environment, run the optimize script on that environment.

Sample output from a small EM system:

bash-4.1$ ./repvfy execute optimize

Please enter the SYSMAN password:
SQL*Plus: Release – Production on Thu Jul 9 07:59:35 2015

Copyright (c) 1982, 2008, Oracle. All rights reserved.

SQL> Connected.

Session altered.
Session altered.

========== ========== ========== ========== ========== ========== ==========
== Internal task system tuning ==
========== ========== ========== ========== ========== ========== ==========

– Setting the number of short workers to 2 (1->2)
– Setting the number of long workers to 2 (1->2)
========== ========== ========== ========== ========== ========== ==========
========== ========== ========== ========== ========== ========== ==========
== Job system tuning ==
========== ========== ========== ========== ========== ========== ==========

– On each OMS, run this command:
  $ emctl set property -name oracle.sysman.core.conn.maxConnForJobWorkers -value 72 -module emoms
  This change will require a bounce of the OMS

========== ========== ========== ========== ========== ========== ==========
========== ========== ========== ========== ========== ========== ==========
== Repository tuning ==
========== ========== ========== ========== ========== ========== ==========
– Setting retention for MGMT_SYSTEM_ERROR_LOG table to 7 days (31->7)

– Disabling PL/SQL tracing for module (EM.GDS)
– Disabling PL/SQL tracing for module (EM_DBM)

– Disabling repository metric tracing for ID (1234)

– Recompiling invalid object (foo,TRIGGER)
– Recompiling invalid object (bar,CONSTRAINT)

– Stale CBO statistics in the repository. Gather statistics for the SYSMAN schema
  Command to use:
  $ repvfy send gather_stats
  SQL> exec emd_maintenance.gather_sysman_stats_job(p_gather_all=>’YES’);

========== ========== ========== ========== ========== ========== ==========
========== ========== ========== ========== ========== ========== ==========
== Target system tuning ==
========== ========== ========== ========== ========== ========== ==========

– Setting the PING grace period to (90) (60->90)

– Set the parameter to 3
  $ emctl set property -module emoms -name -value 3

========== ========== ========== ========== ========== ========== ==========
not spooling currently