EM 13c – Exciting Updates to Target Properties

It’s here, it’s finally here!  I know most of you have already downloaded the binaries and started installing or upgrading your test environment.   It’s just too tempting not to, right?  One question I’ve heard over and over since Oracle Enterprise Manager 12c came out… Can I use User Defined Target Properties in my Dynamic Group and Administration Group?  Sadly the answer has always been no.   Until now.  Now, the answer is proudly YES!

User Defined Target Properties

One of the small but powerful new features in EM 13c is the ability to use your custom target properties to define the Dynamic and Administration groups!  This will work with global target properties, the ones you set as target_type=”*”.  The target specific properties won’t show up in the select list.  Small compromise I think!

First, create your custom target property with emcli command.

$ ./emcli add_target_property -target_type=”*” -property=”Owner”
Property “Owner” added successfully

Next, create a Dynamic Group and select the Define Membership Criteria button.

group6

You’ll see a list of the default target properties. Click the Add/Remove Target Properties button.

target properties

In this list, you will now see the Owner target property that I created earlier.  Select the box and click Select
target properties

Now, you need to set which values of this property you want to be added to this group by clicking the magnifying glass next to Owner.

group7

Since this is Jill’s group, we’re going to select Jill, click Move and then Select.

target properties

Now we see, that this group is going to contain any targets owned by Jill.

group4

Final step is to review membership and click OK.

target properties

Now that the group has been created, if Jill own’s any targets, we’ll see them listed in her group.

target properties

You will also see the global target property in the selection for the Administration Groups as shown here:

Administration Group with User Defined Target Property

 

Target Property List of Values

Another big enhancement is the ability to create a list of values to more accurately store your target properties.  Say your Line of Business has DBA, MW, and App.   However, admins keep entering the wrong values.  These won’t get used in Dynamic or Administration groups because the values were not expected.

To enable a Target Property to use a Master List of Values:

$ ./emcli use_target_properties_master_list -property_name=orcl_gtp_location -enable

Targets exists with values set for this property. Run the same command with -copy_from_targets flag to copy all values to the master list.

If your targets are already using this property, you’ll get the error message above.  Update your emcli command to include the -copy_from_targets flag.
$ ./emcli use_target_properties_master_list -property_name=orcl_gtp_location -enable -copy_from_targets
Successfully migrated property values

To see the target properties, on any target go to the target menu, then Target Setup / Properties.  Click Edit to update properties.

target properties

As you can see, there are no values listed for Location target property.

target properties

$ ./emcli add_to_target_properties_master_list -property_name=”orcl_gtp_location” -property_value=”Houston” -property_value=”Austin”
Successfully set 2 value(s) for property: orcl_gtp_location

Now under the edit Target Properties you’ll find the correct values listed:

target properties

If you added the wrong value, or you need to remove a value, you use the delete_from_target_properties_master_list command:

$ ./emcli delete_from_target_properties_master_list -property_name=”orcl_gtp_location” -property_value=”Houston, Austin”
Successfully deleted property-value

To see the valid values, you can use the list_target_properties_master_list_values command.

$ ./emcli list_target_properties_master_list_values -property_name=orcl_gtp_location
Target Properties Master list of values for property : orcl_gtp_location
Austin
Houston

 

For more on what you can do with Target Properties, you can see my previous post here.   I think with these two enhancements to target properties, EM administrators everywhere will smile a little brighter tonight.  Enjoy!

 

SMS Notifications From Enterprise Manager

Starting with Enterprise Manager (EM) 12c, you have the choice to send SMS notifications to a cell phone, or a pager (does anybody still use pagers?).  There’s been a couple questions on the forums about this so I thought I’d write it up since it appears to be a bit confusing as to how this works.

First, be sure that your Mail Server is setup in Setup / Notifications / Notification Methods and you can receive the test e-mail.

sms7

Next, create an EM administrator user, then login as that user to update your e-mail address and SMS/Pager information by clicking on the Username drop down menu and selecting Enterprise Manager Password & Email.

sms6

For the SMS/Pager, you need the text based address.  So if your provider was Verizon, it would be 8885555555@vtext.com.    Select the Email Type Pager for SMS messages as they are shorter than the Email format.   It’s important to note, that you will not see the multiple lines in the Setup / Security / Administrators view.  You can enter multiple e-mails separated by commas, but the E-mail Type option will not be available.

sms5

By default, both email and pager will be enabled in your Notification Schedule, you may adjust this as necessary by going to Setup / Notifications / My Notification Schedule.   You can receive notifications by both e-mail/page, just e-mail, or just page depending on what you configure here.

Next, you need to create an Incident Rule Set, or edit an existing one.   From Setup / Incidents / Incident Rules, select a rule set and rule to edit.  Once you get to the Action for the Rule, in the Basic Notifications section, select the EM Administrator in the Page box.  Save all changes.

sms8

In 12cR4, you can test the Incident Rule Set by selecting your Rule Set and clicking the Simulate Rules button (Setup / Incidents /  Incident Rules).  You will need to select a Target, Event Type and find an alert to simulate.  Then you will get a list of Actions that the Incident Rule Set will perform for this alert.

To test my notifications, I dropped the warning threshold on the tablespace metric to 15, something I know would trigger immediately.    Here’s the messages I received on my phone.

sms4

I also received the long format in my e-mail.  When you’re done testing, don’t forget to set your thresholds back!

 

Automating the Mundane with Corrective Actions and Oracle Enterprise Manager

In my opinion, one of the most under-utilized features of Oracle Enterprise Manager is the Corrective Actions that can be triggered when a metric alert threshold is crossed.  I think one reason it’s under-utilized is it’s very hard to think about where to start and what can be automated.  My advice from previously implementing, is to look at the alerts that are generated, or tickets, and determine which ones are most frequent and mundane.  The one that always comes to mind is Archive Destination.

Archive was the first CA we implemented at my previous company, because we got at least 20 per day.  Since our backups were controlled by a different team, all we could do was cut a ticket to them, and possibly kick off an archive backup hoping it would complete this time.  So we put this in a Corrective Action.  The script checked for hung backups sessions, checked that a backup wasn’t already running, looked through a config file to get the right information, and then kicked off an archive backup job.   Then it sent an email to a ticketing queue for the backup team to generate the ticket so they could investigate why it was failing.    We set the CA to run on Warning, with a fairly low threshold so we had plenty of time to react if it got to Critical.  This was such a success that we went on to write more, including automating tablespace adds and sending notifications to the application teams when process/sessions was exceeded.

A friend of mine, Tyler Sharp, recently started implementing Corrective Actions and has found tremendous time savings.    He recently had the idea to automate oradebug steps so they would always have the required debug when working with Oracle Support, instead of having to go through the process manually the next time around.   The CA is triggered when > 4 active sessions waiting on concurrency, or over 900 seconds db blocking time.  He was kind enough to share the script they’ve implemented below:

conn / as sysdba
set serverout on

DECLARE
   trace_name   VARCHAR2 (1000) := NULL;
   alter_session   VARCHAR2 (1000) := NULL;

BEGIN
   SELECT to_char (SYSDATE, ‘MMDDYY_HHMISS’) INTO trace_name FROM DUAL;
   alter_session :=
         ‘alter session set tracefile_identifier=”’||trace_name||’_AUTO_HANGANALYZE”’;
   DBMS_OUTPUT.PUT_LINE (alter_session);
   EXECUTE IMMEDIATE alter_session;
END;
/

oradebug setmypid;
oradebug unlimit;
oradebug dump ashdumpseconds 5
oradebug hanganalyze 3
execute sys.dbms_lock.sleep(180);
oradebug hanganalyze 3

As you can see, they set a trace file name, take an ashdump, then do the required hanganalyze twice with a  sleep in between.   Now the DBA can skip these steps when working an issue, and collect the files that were created at the right time, not 10 minutes later.   You’ll need to be sure to have a credential that has sysdba access to run this properly.

The great thing about Corrective Actions, is you can use them in a template so you can push them to all servers to keep your resolutions standard.    The Corrective Action is triggered for either Warning or Critical, or both.  Then you have the choice to get notified of that alert right away, or bypass notifications unless the Corrective Action failed.   This allows you a fall back in case the script or job has a problem fixing the issue.

To learn more about Corrective Actions, check the Oracle Enterprise Manager 12c Cloud Control Administrator’s Guide and check out the following blog posts for more ideas!

What are the Corrective Actions you’ve implemented, or would like to implement in your environment?

Standardize Target Monitoring with Templates

Enterprise Manager is a critical tool for monitoring database and middleware targets, as well as Engineered Systems and hosts.  Each target has it’s own set of metrics. If you read my previous posts on viewing metrics and setting thresholds, you’ve got a good understanding of how to set thresholds on a single target.  What if you have 100 targets?  Or a 1000?   Your targets for production may even have different thresholds then non-production.   Do you really want to manually set these metrics up on all targets?   Not likely.   If you have more than 3 databases or targets, you should probably consider standardizing your monitoring by using Monitoring Templates.   Templates allow you to  reuse the metrics you’ve defined for like targets.

From Enterprise menu, select Monitoring / Monitoring Templates.

temp1

You can see in the search box, you can display Oracle Certified templates. temp2

If you check this, you’ll find a long list of templates for various midddleware and application situations.

temp3

Create Template from Target

The first method to create a template is based on an existing target.  This allows you to configure your monitoring on one sample target, and copy this to a template.

Click Create.    Notice the copy monitoring settings from Target is selected.

temp4

Click the search icon to find the sample target you want to copy metrics from and click Select.temp5

First we need to give our template a name.  If you’re going to have multiple templates, it’s best to give them a detailed name to make them distinct and easily identified.     Notice the Default Template checkbox – if you check this, this template will be automatically applied to all new (not existing) Cluster Database targets as they are discovered in Enterprise Manager.  Only one default template per target type can be identified.

temp6

Click on Metric Thresholds and you will see a familiar screen with the target metrics and Warning and Critical thresholds.

temp7

If there’s additional metrics you want to add, or maybe remove from this template, click the Remove or Add metrics button.

temp8

When adding metrics, you’ll be able to search for another target, template or metric extension that you wish to add to this template.

temp9

When you’ve made your adjustments, click on the OK button to save your template.  You’ll get a confirmation when your template is created. temp10

Create from Target Type

From Monitoring Templates, click Create, this time select the option for Target Type.  This option will pull the default registered metrics for that particular target type.

temp4

Next you’ll select a category and the target type.  For Database, we will select Database Instance.    From here, the process is the same.  This template will have all default recommended metrics and you can make your adjustments from here.

temp15

Apply Templates

Now that you have a new template, you can select this template and click Apply to apply to any existing targets.    temp11

The Apply Options are important to consider.  By default, templates override only metrics common to template and target. This means if there’s a metric on the target, that is not included in the template, it is not removed or replaced.  If the metric has different thresholds or no thresholds, then it is updated to match the template.  The top option, to completely replace settings on the target will make the target identical to the target.   Which means if there are metrics not in the template, the apply will remove thresholds for those metrics and no longer alert.

temp12

The Key Values section tells the template apply how you want to handle those metrics such as Tablespace that might have multiple key values, say different thresholds for SYSTEM and SYSAUX tablespaces.

temp13

Click Add to select the targets or group you would like to apply the template to, and click Select.  Then click OK to submit the Apply job.

 

temp14

You can view the apply status from the Past Apply Operations button and get information on succeeded and failed operations.

So now you can take some time up front, standardize your metrics, and enforce them with templates.

 

Hands on Monitoring Exercises with Enterprise Manager

Dive deeper into the areas that interest you!   All steps can be done on your lab box or on your own Enterprise Manager system.

View Data with All Metrics

Modifying Metrics and Collections

Create a Template

Create a Metric Extension to notify on expiring DBSNMP accounts

Create a Metric Extension for Fast Recovery Area

Create a Repository-Side Metric Extension

Filter out a specific alert from incident rules

Managing Metric Thresholds in Enterprise Manager

One of the most critical steps in monitoring your targets with Enterprise Manager, is to set your metrics and thresholds properly for your environment.   All targets will have predefined metrics that will be enabled and thresholds set based on recommendations from Oracle product teams.    These may or may not be good for your environment.    Customers all have different requirements for what they want to be e-mailed, paged or notified by ticket about.

The most common metrics for databases are going to be the ones that cause service outages:  availability, space issues, archiver issues, data guard gaps, critical ORA- errors.   Some things, you just don’t need to know about at 2am though, things like global cache blocks lost.

From the target menu, select Monitoring / Metric and Collection Settings.  This will show you the current settings of your target.  Notice the default view is Metrics with Thresholds.  Other items are collected and can be seeing in the All Metrics view.

metric2

Let’s take a closer look at what we see here.  First we have the metric grouping or category.  Then for each metric in the group, you’ll have the operator, warning and critical thresholds.  These are the most important.  If you don’t provide a value, alerts will not be triggered as there will be no threshold violations.  The next column displays if a corrective action job has been registered on this metric. Followed by the collection schedule and Edit icon.

metric3

 

Clicking on the link in Collection Schedule will bring you to the collection settings.  You can enable or disable a metric collection, change the frequency, and determine whether alert only or historical trending data will be saved.   If you select alert only, it will only store occurrences where thresholds are violated.  Pay careful attention to the Affected Metrics section, as some metrics are collected in a group, and modifying these settings will affect all metrics in that group.

metric4

Returning to the main screen, click on the pencil icon to edit the metric.

metric6

This first section is where you can add a Corrective Action job if you want to automatically fix your alerts.  An example would be kicking off a RMAN archive log backup job when Archive Area Used % event is triggered.

metric7

In the Advanced Threshold section, you can determine how many times a threshold must be exceeded in a row to trigger an alert.  So if you want to alert if CPU is 95% for over 3 collections (15 minutes), then you would set Number of Occurrences to 3.

metric8

Template override allows an administrator to prevent a particular metric from being changed when templates are applied.  You want to avoid this as a common practice and reserve for special exceptions.

metric9

 

 

The Threshold Suggestion section allows you to evaluate what warning and critical severity alerts  would be generated if you changed thresholds.  You can look at the last month of collected metrics to make the best threshold estimates.  metric11

If your metric has multiple keys, you will have an additional screen where you can add additional keys.  A key would be a filesystem, or a tablespace that you want to monitor with different thresholds then the rest.

metric5

Whey you’re finished making changes, clicking Continue and OK to save metric changes to the repository and push out to the Agent.   Once you get a target set up for monitoring the way you want, you can create a template to push the same settings to all like targets.   I’ll cover this in another post soon!

Getting to Know Your Target with All Metrics View

Every target in Enterprise Manager has a set of target related metrics.   These metrics control what is collected, how frequently, and whether alerts and notifications are sent.   They are defined by target metadata and are specific to a particular target type.  The metric is collected by the Agent on regular intervals, and then batch uploaded to the EM repository.   Exploring these collected metrics can provide you with a wealth of information about your target.

From the target, click the target menu / Monitoring / All Metrics.

viewmetric1

In this view you will get all possible metrics for this target.   You’ll also see a list of the Open Metric Events (a metric that has crossed a threshold), and the top 5 events over the last 7 days.

viewmetric2

If you click on a metric category on the left, you’ll get the real-time values of those metrics.   The Last Upload is telling you when these metrics were last collected and uploaded to the repository.

viewmetric3

To see those values, expand the category by clicking on the viewmetric4and selecting a specific metric, in this example Tablespace Space Used %.

viewmetric5

This view is now showing you the last collection, by tablespace with average, low, high and last known values.   You will see the severity is clear for all tablespaces at this time.  If you have an open event, you may see a warning or critical icon here.    When you select an individual tablespace, a chart will appear in the lower half of the screen.

viewmetric6

In this lower section, you can do a variety of actions.  At the top you’ll see a summary of the metric data, as well as the option to Modify Thresholds.  Thresholds saved will be sent out to the agent for changes.

viewmetric10

If you want to see the metrics in table view to see the exact values and timestamps over the last several days, click the Table View link.

viewmetric7

Under Options, you can also export this metric data to a CSV file.   Or maybe you want to see related metrics or problem analysis to identify what might have caused an issue with this metric.

viewmetric9

When viewing Related Metrics, the predefined related metrics will be displayed, but you can add your own from any targets.

viewmetric11

Additionally, you can compare to other keys, which would be other tablespaces in this example.  Or you can compare to other targets, say if you wanted to compare CPU utilization on 2 hosts.

viewmetric8

By default, the data is show for a 24 hour period.  Options to view 7 days, 31 days, and custom time periods are also available.

viewmetric12

There’s a wealth of information collected and stored, and the best place to start looking at it is in the All Metrics view.  This can help you identify collection category, additional metrics you might be interested in, and patters and trends on alerts.

 

Preventing Alerts on OS Audit File Size when Upgrading DB Plug-in

In January, the DB Plug-in 12.1.0.7.0 was released for EM 12c.   Not long after, my friend Brian found they added a new metric with default thresholds.   The new metric group is Operating System Audit Files and the metric alerts on Size of Audit Files.  Depending on the size and agent of your environment, you may immediately start getting notifications or pages as the default thresholds are 10MB/20MB, which can be quite small.

An example of the notification you might receive:

 Host=xxxxxx.us.oracle.com 
 Target type=Database Instance 
 Target name=emrep 
 Categories=Capacity 
 Message=35.39 MB of Audit Trail files collected (.aud: 35.39, .xml: 0, .bin: 0) 
 Severity=Critical 
 Event reported time=Feb 6, 2015 6:53:56 AM PST 
 Target Lifecycle Status=Production 
 Operating System=Linux
 Platform=x86_64
 Department=DBA
 Associated Incident Id=2103 
 Associated Incident Status=New 
 Associated Incident Owner= 
 Associated Incident Acknowledged By Owner=No 
 Associated Incident Priority=None 
 Associated Incident Escalation Level=0 
 Event Type=Metric Alert 
 Event name=sizeOfOSAuditFiles:FILE_SIZE 
 Metric Group=Operating System Audit Records
 Metric=Size of Audit Files
 Metric value=35.39
 Key Value= 
 Rule Name=DBA_Incident_Rule,Create incident for critical metric alerts 
 Rule Owner=SYSMAN 
 Update Details:
 3.39 MB of Audit Trail files collected (.aud: 3.39, .xml: 0, .bin: 0)
 Incident created by rule (Name = DBA_Incident_Rule, Create incident for critical metric alerts; Owner = SYSMAN).


So if you’re planning to upgrade 1000 agents with the new Database Plugin, you might start getting a little nervous about receiving all of these pages.   Since the metric didn’t exist before, it’s not included in your templates to be disabled.  Even if it were in the templates, it would likely alert before you could reapply templates.

Luckily, Incident Rules provide a method to exclude a particular event when evaluating an Incident Rule.

From Setup -> Incidents -> Incident Rules, you’ll want to edit your defined Incident Rule.  If you haven’t customized an Incident Rule, you can select the default Incident Ruleset and do a Create Like to clone and be able to edit.

inc1

Select the Metric Alert rule, and click Edit.

inc2

On this first screen, you’ll see the Advanced Selection Options.  If you expand this you’ll see an option for Event name.  This is where you can exclude a specific metric event by select Not Equals and enter the event name.   In the case of this metric, the event name is sizeOfOSAuditFiles:FILE_SIZE.

inc3

 

Click Next and Continue until you finally get to Save.   To validate, you can use the Simulate Rules or trigger an alert to see if it sends the email.

This concept can be applied to help filter out other events, or categories of metrics as needed.

 

Notifications for Expiring DBSNMP Passwords

Most user accounts these days have a password profile on them that automatically expires the password after a set number of days.   Depending on your company’s security requirements, this may be as little as 30 days or as long as 365 days, although typically it falls between 60-90 days. For a normal user, this can cause a small interruption in your day as you have to go get your password reset by an admin. When this happens to privileged accounts, such as the DBSNMP account that is responsible for monitoring database availability, it can cause bigger problems.

In Oracle Enterprise Manager 12c you may notice the error message “ORA-28002: the password will expire within 5 days” when you connect to a target, or worse you may get “ORA-28001: the password has expired”. If you wait too long, your monitoring will fail because the password is locked out. Wouldn’t it be nice if we could get an alert 10 days before our DBSNMP password expired? Thanks to Oracle Enterprise Manager 12c Metric Extensions (ME), you can! See the Oracle Enterprise Manager Cloud Control Administrator’s Guide for more information on Metric Extensions.

Read more here

Organizing Your Oracle Enterprise Manager Targets

If you’re monitoring more than a handful of servers or databases in your Enterprise Manager 12c (EM), you have probably started creating groups to manage many targets together.   If you haven’t, this is one of the most critical aspects of setting up your EM to properly monitor and manage targets.  There are several use cases where you will want to perform a single action on multiple targets.

  • Setting monitoring thresholds
  • Granting privileges
  • Sending notifications
  • Applying compliance rules
  • Viewing dashboards
  • Running jobs, upgrades, backups
  • Creating reports

Continue reading “Organizing Your Oracle Enterprise Manager Targets”