534562 – (RHQ-1347) Investigate high volume of alerts on JON Server reliability

Bug 534562 (RHQ-1347) - Investigate high volume of alerts on JON Server reliability

Summary: Investigate high volume of alerts on JON Server reliability

Keywords:
Status:	CLOSED DUPLICATE of bug 536155
Alias:	RHQ-1347
Product:	RHQ Project
Classification:	Other
Component:	Alerts
Sub Component:
Version:	unspecified
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	RHQ Project Maintainer
QA Contact:
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-12 21:38 UTC by Charles Crouch
Modified:	2015-02-01 23:24 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-08-18 15:40:22 UTC
Embargoed:

Attachments	(Terms of Use)

Description Charles Crouch 2009-01-12 21:38:00 UTC

<snip>
there could be problems with alerts if you have a lot of
MATCHING conditions.  The alerts cache is fast.  So, as long as most
pieces of data do NOT trigger conditions to become true we'll stay
standing no problem.  Issues arise when users create conditions that
always fire each time the metric is collected.  Realistically speaking,
conditions like that have no meaning.  Alerts SHOULD be used for
exceptional cases, not as a new form of logging.  ; )

That said, I have tested high volume alert on postgres 8.3 then the db
and server were colocated and with NO other appreciable load being
applied to the server machine at that time.  I saw that the system was
handling about ~1500 matches / second.  This will certainly decrease
when the server is remote to the db, and will certainly go down further
when the system is loaded, but what do we think is an acceptable /
reasonable figure to shoot for?
<snip>

From the above 1500matches/second sounds pretty fast. With 30 second metric collection intervals, a single condition per alert definition and a single alert definition per resource, this is equivalent to firing an alert every 30seconds on every resource in a 45k resource inventory. The other side of this equation is how well its possible to select and purge from the alert tables once this volume of data has been inserted.

Comment 1 Charles Crouch 2009-01-12 21:43:36 UTC

Its seems generally unlikely that users will setup their alert systems in such a way as to generate this many alerts from measurements alone, but it could be more feasible if you consider the rapid way in which Events can be inserted into the system. In this case it maybe helpful to require users to specify a regex filter, (they can use * if they want all alerts of a given priority) in the alert definition. Making this field mandatory, and adding a warning about the possibly large number of alerts which could be generated from unfiltered alerts, maybe help reduce the likelihood of users running into problems here.

Comment 2 Red Hat Bugzilla 2009-11-10 20:31:04 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1347

Comment 3 wes hayutin 2010-02-16 15:44:48 UTC

mass move off the qa triage list.  These are tasks for dev.

Comment 4 Corey Welton 2010-08-18 15:40:22 UTC

Closing per triage

*** This bug has been marked as a duplicate of bug 536155 ***

Note You need to log in before you can comment on or make changes to this bug.