Red Hat Bugzilla – Bug 813914
RFE: Notify on a subsystem falling behind
Last modified: 2015-02-01 18:28:52 EST
This an RFE to provide better issue escalation when the alert subsystem is failing behind (e.g. it can't keep up with the alert definition evaluation based on the rate at which items: measurement reports, events etc, are coming in). This could be either because there are too many items or too many alert definitions.
It is arising from case 00470214 (https://c.na7.visual.force.com/apex/Case_View?id=500A0000007AhjH&srKp=500&sfdc.override=1&srPos=0)
The idea would be that the user is notified (message center? mail to rhqadmin?) about the alert subsystem failing behind. The details of the notification should include diagnostic data (e.g. rows count of the alert condition history table or row counts of the raw, hourly, and daily metric tables) that could be passed on by the customer to support which would then help support diagnose the underlying issue.
Made the title more generic. The sort of notification we're looking for here is generic to all subsystems, e.g. metric collection, events, alerts