Red Hat Bugzilla – Bug 536155
replacing db-based jbossmq with jboss messaging 2.0 with async journaling
Last modified: 2010-08-18 11:40:22 EDT
JBoss Message 2.0.0.alpha was released recently, and has some promising performance improvements over JBoss MQ (which is the default JMS impl in JBoss 4.2):
so i'm thinking we should use the JBS configuration file to expose JMS-compliant destinations via JEE standard practices - namely JNDI. then use the special JBS deployer that reads these config files, creates objects (as necessary) and plops them into JNDI.
even if we wanted to use the blocking persistence, we're still going to double the capacity the alerts subsystem can handle with a simple drop/replace of the backing impl for JMS. however, async is really the way to go, because it'll give an additional 10-11x speedup. that's 20-22x faster than our current impl.
aside from that, it's a much better choice architecturally because the in-band and out-band processing halves of the alerts engine won't need to go back to our database bottleneck for persistence of matched alerting data. each server instance in the rhq server-cluster can use non-database-based journaling, which gives us much more options; for instance, local journaling, would allow the (current design of the) alerts engine to scale linearly with respect to the rhq server-cluster.
the operative phrase in all of this is the parenthetical "current design of the [alerts engine]" that makes this proposed improvement possible.
this solution has immediate and tremendous benefit for local-only alerts - where all of the alert conditions refer to the same resource in inventory. once we move into the realm of composite alerts, where you can aggregate different alert conditions across completely arbitrary resources (even those on different agents), the benefit decreases because data locality becomes important.
technically, as long as the conditions (across different resources) only involve resources that are co-located (managed by the same agent), the benefit still holds. but once you introduce conditions that refer back to resources from different agents into a single composite alert, then instead of being able to process data from a single journal, there would need to be another controller layer above that which knew how to process the aggregate conditions of the composite alerts across the entire rhq server-cluster (not just a single server-collector node).
but that reminds me of a question that Jay Shaughnessy asked a few months ago, about whether we really need to have durability of unmatched alert condition logs and alert events. I think the answer for local-only alerts is 'no'. But for composite alerts with conditions that refer back to resources from different agents 'yes'. so, if we want to squeak as much performance out as possible, we'll likely be dropping data into different journals that have different persistent guarantees depending on whether the matched alert data refers back to a local-only or composite alert.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-536
Closing this bug per triage. If this is still considered an issue, it can be reopened.
*** Bug 534562 has been marked as a duplicate of this bug. ***