Bug 535484 - (RHQ-2174) further reduce contention on agent tables
further reduce contention on agent tables
Status: CLOSED WONTFIX
Product: RHQ Project
Classification: Other
Component: Alerts (Show other bugs)
unspecified
All All
high Severity medium (vote)
: ---
: ---
Assigned To: Joseph Marques
Jeff Weiss
http://jira.rhq-project.org/browse/RH...
: Improvement
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-25 03:17 EDT by Joseph Marques
Modified: 2014-11-09 17:49 EST (History)
2 users (show)

See Also:
Fixed In Version: 1.3
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-22 14:52:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joseph Marques 2009-06-25 03:17:00 EDT
some work has already bee done to remove application hot spots:

rev4160 - [RHQ-2124][RHQ-1656][RHQ-1221] - removed hot spots and various other points of contention by shortening transaction times or using indexes as available for: a) uninventory work, b) cloud manager job, c) check for suspect agent job, d) dynagroup recalculation job, e) alerts cache in-band agent and server status bit setting, f) isAgentBackfilled checking

my concern is that applying changes to many template, will trickle down to hundreds if not thousands of alert definitions, taxing the agent table a lot to set the status bit.  so, StatusManagerBean should be rewritten to only set status if it's absolutely necessary, and to use a simple true/false bit semantic instead of the more complicated bit mask.  as an optional way to aide in remote debugging, the classic bitmask strategy should be kept if running in debug mode so that it's easier to determine whether the backend alert cache consistent protocols are working as intended.
Comment 1 Joseph Marques 2009-06-25 03:28:49 EDT
rev4180 -  remove hot spots when updating the agent status field by requiring only the FIRST thread to set the status field, all update statements thereafter thus do not need to acquire the update lock on the row; 

to test:

* update some alert definition, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* update some alert tempate, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* manually calculate some measurement baseline, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* assuming templates are setup, commit some new resources, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* let the hourly baseline job complete.  assuming that even 1 baseline was calculated, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
Comment 2 Jeff Weiss 2009-07-20 13:19:48 EDT
2009-07-20 12:22:27,002 INFO  [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] jweiss-rhel1.usersys.redhat.com took [80]ms to reload global cache
2009-07-20 12:22:27,142 INFO  [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] jweiss-rhel1.usersys.redhat.com took [140]ms to reload cache for 1 agents

I get the above when updating an alert.  But it did not appear when manually calc'ing a baseline.  I also tried setting the high/low range to a value I typed in.  I got the 2nd line above when i updated the high.  But nothing when i updated the low.  

Templates have a regression currently so I will retest when the baseline problem is resolved.

rev4423
Comment 3 Red Hat Bugzilla 2009-11-10 15:59:17 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2174
This bug relates to RHQ-2405
Comment 4 Corey Welton 2010-11-22 14:52:54 EST
closing per gwt effort

Note You need to log in before you can comment on or make changes to this bug.