536155 – (RHQ-536) replacing db-based jbossmq with jboss messaging 2.0 with async journaling

Bug 536155 (RHQ-536) - replacing db-based jbossmq with jboss messaging 2.0 with async journaling

Summary: replacing db-based jbossmq with jboss messaging 2.0 with async journaling

Keywords:
Status:	CLOSED NOTABUG
Alias:	RHQ-536
Product:	RHQ Project
Classification:	Other
Component:	Alerts
Sub Component:
Version:	1.0
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Joseph Marques
QA Contact:
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Duplicates (1):	RHQ-1347 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-06-02 16:17 UTC by Joseph Marques
Modified:	2010-08-18 15:40 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-08-10 17:59:56 UTC
Embargoed:

Attachments	(Terms of Use)

Description Joseph Marques 2008-06-02 16:17:00 UTC

JBoss Message 2.0.0.alpha was released recently, and has some promising performance improvements over JBoss MQ (which is the default JMS impl in JBoss 4.2):

http://www.jboss.org/file-access/default/members/jbossmessaging/freezone/docs/userguide-2.0.0.alpha1/html/performance.html#performance.results

so i'm thinking we should use the JBS configuration file to expose JMS-compliant destinations via JEE standard practices - namely JNDI.  then use the special JBS deployer that reads these config files, creates objects (as necessary) and plops them into JNDI.

even if we wanted to use the blocking persistence, we're still going to double the capacity the alerts subsystem can handle with a simple drop/replace of the backing impl for JMS.  however, async is really the way to go, because it'll give an additional 10-11x speedup.  that's 20-22x faster than our current impl.  

aside from that, it's a much better choice architecturally because the in-band and out-band processing halves of the alerts engine won't need to go back to our database bottleneck for persistence of matched alerting data.  each server instance in the rhq server-cluster can use non-database-based journaling, which gives us much more options; for instance, local journaling, would allow the (current design of the) alerts engine to scale linearly with respect to the rhq server-cluster.

Comment 1 Joseph Marques 2008-06-02 16:57:35 UTC

the operative phrase in all of this is the parenthetical "current design of the [alerts engine]" that makes this proposed improvement possible.

this solution has immediate and tremendous benefit for local-only alerts - where all of the alert conditions refer to the same resource in inventory. once we move into the realm of composite alerts, where you can aggregate different alert conditions across completely arbitrary resources (even those on different agents), the benefit decreases because data locality becomes important.

technically, as long as the conditions (across different resources) only involve resources that are co-located (managed by the same agent), the benefit still holds. but once you introduce conditions that refer back to resources from different agents into a single composite alert, then instead of being able to process data from a single journal, there would need to be another controller layer above that which knew how to process the aggregate conditions of the composite alerts across the entire rhq server-cluster (not just a single server-collector node).

but that reminds me of a question that Jay Shaughnessy asked a few months ago, about whether we really need to have durability of unmatched alert condition logs and alert events. I think the answer for local-only alerts is 'no'. But for composite alerts with conditions that refer back to resources from different agents 'yes'. so, if we want to squeak as much performance out as possible, we'll likely be dropping data into different journals that have different persistent guarantees depending on whether the matched alert data refers back to a local-only or composite alert.

Comment 2 Red Hat Bugzilla 2009-11-10 21:11:21 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-536

Comment 3 Corey Welton 2010-08-10 17:59:56 UTC

Closing this bug per triage.  If this is still considered an issue, it can be reopened.

Comment 4 Corey Welton 2010-08-18 15:40:22 UTC

*** Bug 534562 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.