Bug 813914

Summary: RFE: Notify on a subsystem falling behind
Product: [Other] RHQ Project Reporter: Charles Crouch <ccrouch>
Component: No ComponentAssignee: RHQ Project Maintainer <rhq-maint>
Status: NEW --- QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 3.0.0CC: hbrock, hrupp
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Charles Crouch 2012-04-18 14:28:51 EDT
This an RFE to provide better issue escalation when the alert subsystem is failing behind (e.g. it can't keep up with the alert definition evaluation based on the rate at which items: measurement reports, events etc, are coming in). This could be either because there are too many items or too many alert definitions.
It is arising from case 00470214 (https://c.na7.visual.force.com/apex/Case_View?id=500A0000007AhjH&srKp=500&sfdc.override=1&srPos=0)

The idea would be that the user is notified (message center? mail to rhqadmin?) about the alert subsystem failing behind. The details of the notification should include diagnostic data (e.g. rows count of the alert condition history table or row counts of the raw, hourly, and daily metric tables) that could be passed on by the customer to support which would then help support diagnose the underlying issue.
Comment 1 Charles Crouch 2012-04-18 14:54:39 EDT
Made the title more generic. The sort of notification we're looking for here is generic to all subsystems, e.g. metric collection, events, alerts