Standardize Target Monitoring with Templates

Enterprise Manager is a critical tool for monitoring database and middleware targets, as well as Engineered Systems and hosts.  Each target has it’s own set of metrics. If you read my previous posts on viewing metrics and setting thresholds, you’ve got a good understanding of how to set thresholds on a single target.  What if you have 100 targets?  Or a 1000?   Your targets for production may even have different thresholds then non-production.   Do you really want to manually set these metrics up on all targets?   Not likely.   If you have more than 3 databases or targets, you should probably consider standardizing your monitoring by using Monitoring Templates.   Templates allow you to  reuse the metrics you’ve defined for like targets.

From Enterprise menu, select Monitoring / Monitoring Templates.

temp1

You can see in the search box, you can display Oracle Certified templates. temp2

If you check this, you’ll find a long list of templates for various midddleware and application situations.

temp3

Create Template from Target

The first method to create a template is based on an existing target.  This allows you to configure your monitoring on one sample target, and copy this to a template.

Click Create.    Notice the copy monitoring settings from Target is selected.

temp4

Click the search icon to find the sample target you want to copy metrics from and click Select.temp5

First we need to give our template a name.  If you’re going to have multiple templates, it’s best to give them a detailed name to make them distinct and easily identified.     Notice the Default Template checkbox – if you check this, this template will be automatically applied to all new (not existing) Cluster Database targets as they are discovered in Enterprise Manager.  Only one default template per target type can be identified.

temp6

Click on Metric Thresholds and you will see a familiar screen with the target metrics and Warning and Critical thresholds.

temp7

If there’s additional metrics you want to add, or maybe remove from this template, click the Remove or Add metrics button.

temp8

When adding metrics, you’ll be able to search for another target, template or metric extension that you wish to add to this template.

temp9

When you’ve made your adjustments, click on the OK button to save your template.  You’ll get a confirmation when your template is created. temp10

Create from Target Type

From Monitoring Templates, click Create, this time select the option for Target Type.  This option will pull the default registered metrics for that particular target type.

temp4

Next you’ll select a category and the target type.  For Database, we will select Database Instance.    From here, the process is the same.  This template will have all default recommended metrics and you can make your adjustments from here.

temp15

Apply Templates

Now that you have a new template, you can select this template and click Apply to apply to any existing targets.    temp11

The Apply Options are important to consider.  By default, templates override only metrics common to template and target. This means if there’s a metric on the target, that is not included in the template, it is not removed or replaced.  If the metric has different thresholds or no thresholds, then it is updated to match the template.  The top option, to completely replace settings on the target will make the target identical to the target.   Which means if there are metrics not in the template, the apply will remove thresholds for those metrics and no longer alert.

temp12

The Key Values section tells the template apply how you want to handle those metrics such as Tablespace that might have multiple key values, say different thresholds for SYSTEM and SYSAUX tablespaces.

temp13

Click Add to select the targets or group you would like to apply the template to, and click Select.  Then click OK to submit the Apply job.

 

temp14

You can view the apply status from the Past Apply Operations button and get information on succeeded and failed operations.

So now you can take some time up front, standardize your metrics, and enforce them with templates.

 

Managing Metric Thresholds in Enterprise Manager

One of the most critical steps in monitoring your targets with Enterprise Manager, is to set your metrics and thresholds properly for your environment.   All targets will have predefined metrics that will be enabled and thresholds set based on recommendations from Oracle product teams.    These may or may not be good for your environment.    Customers all have different requirements for what they want to be e-mailed, paged or notified by ticket about.

The most common metrics for databases are going to be the ones that cause service outages:  availability, space issues, archiver issues, data guard gaps, critical ORA- errors.   Some things, you just don’t need to know about at 2am though, things like global cache blocks lost.

From the target menu, select Monitoring / Metric and Collection Settings.  This will show you the current settings of your target.  Notice the default view is Metrics with Thresholds.  Other items are collected and can be seeing in the All Metrics view.

metric2

Let’s take a closer look at what we see here.  First we have the metric grouping or category.  Then for each metric in the group, you’ll have the operator, warning and critical thresholds.  These are the most important.  If you don’t provide a value, alerts will not be triggered as there will be no threshold violations.  The next column displays if a corrective action job has been registered on this metric. Followed by the collection schedule and Edit icon.

metric3

 

Clicking on the link in Collection Schedule will bring you to the collection settings.  You can enable or disable a metric collection, change the frequency, and determine whether alert only or historical trending data will be saved.   If you select alert only, it will only store occurrences where thresholds are violated.  Pay careful attention to the Affected Metrics section, as some metrics are collected in a group, and modifying these settings will affect all metrics in that group.

metric4

Returning to the main screen, click on the pencil icon to edit the metric.

metric6

This first section is where you can add a Corrective Action job if you want to automatically fix your alerts.  An example would be kicking off a RMAN archive log backup job when Archive Area Used % event is triggered.

metric7

In the Advanced Threshold section, you can determine how many times a threshold must be exceeded in a row to trigger an alert.  So if you want to alert if CPU is 95% for over 3 collections (15 minutes), then you would set Number of Occurrences to 3.

metric8

Template override allows an administrator to prevent a particular metric from being changed when templates are applied.  You want to avoid this as a common practice and reserve for special exceptions.

metric9

 

 

The Threshold Suggestion section allows you to evaluate what warning and critical severity alerts  would be generated if you changed thresholds.  You can look at the last month of collected metrics to make the best threshold estimates.  metric11

If your metric has multiple keys, you will have an additional screen where you can add additional keys.  A key would be a filesystem, or a tablespace that you want to monitor with different thresholds then the rest.

metric5

Whey you’re finished making changes, clicking Continue and OK to save metric changes to the repository and push out to the Agent.   Once you get a target set up for monitoring the way you want, you can create a template to push the same settings to all like targets.   I’ll cover this in another post soon!

Getting to Know Your Target with All Metrics View

Every target in Enterprise Manager has a set of target related metrics.   These metrics control what is collected, how frequently, and whether alerts and notifications are sent.   They are defined by target metadata and are specific to a particular target type.  The metric is collected by the Agent on regular intervals, and then batch uploaded to the EM repository.   Exploring these collected metrics can provide you with a wealth of information about your target.

From the target, click the target menu / Monitoring / All Metrics.

viewmetric1

In this view you will get all possible metrics for this target.   You’ll also see a list of the Open Metric Events (a metric that has crossed a threshold), and the top 5 events over the last 7 days.

viewmetric2

If you click on a metric category on the left, you’ll get the real-time values of those metrics.   The Last Upload is telling you when these metrics were last collected and uploaded to the repository.

viewmetric3

To see those values, expand the category by clicking on the viewmetric4and selecting a specific metric, in this example Tablespace Space Used %.

viewmetric5

This view is now showing you the last collection, by tablespace with average, low, high and last known values.   You will see the severity is clear for all tablespaces at this time.  If you have an open event, you may see a warning or critical icon here.    When you select an individual tablespace, a chart will appear in the lower half of the screen.

viewmetric6

In this lower section, you can do a variety of actions.  At the top you’ll see a summary of the metric data, as well as the option to Modify Thresholds.  Thresholds saved will be sent out to the agent for changes.

viewmetric10

If you want to see the metrics in table view to see the exact values and timestamps over the last several days, click the Table View link.

viewmetric7

Under Options, you can also export this metric data to a CSV file.   Or maybe you want to see related metrics or problem analysis to identify what might have caused an issue with this metric.

viewmetric9

When viewing Related Metrics, the predefined related metrics will be displayed, but you can add your own from any targets.

viewmetric11

Additionally, you can compare to other keys, which would be other tablespaces in this example.  Or you can compare to other targets, say if you wanted to compare CPU utilization on 2 hosts.

viewmetric8

By default, the data is show for a 24 hour period.  Options to view 7 days, 31 days, and custom time periods are also available.

viewmetric12

There’s a wealth of information collected and stored, and the best place to start looking at it is in the All Metrics view.  This can help you identify collection category, additional metrics you might be interested in, and patters and trends on alerts.