Bug 536444 (RHQ-792) - asynchronous updating of metric tempates / schedules
Summary: asynchronous updating of metric tempates / schedules
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: RHQ-792
Product: RHQ Project
Classification: Other
Component: Monitoring
Version: 1.0.1
Hardware: All
OS: All
high
medium
Target Milestone: ---
: ---
Assignee: Joseph Marques
QA Contact: Corey Welton
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-06 06:12 UTC by Joseph Marques
Modified: 2010-08-12 16:51 UTC (History)
0 users

Fixed In Version: 2.4
Clone Of:
Environment:
Last Closed: 2010-08-12 16:51:43 UTC
Embargoed:


Attachments (Terms of Use)

Description Joseph Marques 2008-09-06 06:12:00 UTC
today, an update to a metric schedule requires that the corresponding agent be online.  while this doesn't seem that restrictive, it becomes problematic at large scales.  in particular, updating of a metric template (which updates all schedules for all resources of the corresponding type) require that all agents for all resources be online.

improvement: make schedule updates asynchronous.  using the UI should return immediately, whether or not the agent is online.  if the agent is online, it should get the update (relatively) immediately.  if the agent if offline, there should be logic that the agent checks where the server-side schedules need to be updated.

Comment 1 Joseph Marques 2008-09-06 23:18:10 UTC
rev1373 - test to make sure the corresponding AgentClient is up, before interacting with it (use 2000ms timeout); 
on the crazy off-chance that the AgentClient is available but sending the report fails, catch Throwable to make sure the caller's request will continue to completion; 
update mtime's of resources whose MeasurementSchedules are being changed (directly on the resource, or indirectly through the metric template); 
while i was at it, improve the performance of the end-to-end flow for metric tempalte updates by batch all ResourceMeasurementSchedulesRequest's for a single agent into a single remote method call and, more importantly, single check against the availability of the corresponding AgentClient; 

Comment 2 Joseph Marques 2008-09-06 23:52:12 UTC
test 1 - single resource, update schedule, agent online

1) go to some resource in inventory whose agent is up
2) navigate to monitor > configuration subtab
3) change some collection interval to something odd like 42s
4) go to that agent prompt and execute "inventory -x -e inv.dat"
5) then in a sep terminal execute "cat inv.dat | grep 42000 -c" and make sure the count is precisely 1

test 2 - single resource, disable schedule, agent online

1) go to some resource in inventory whose agent is up
2) navigate to monitor > configuration subtab
3) disable the collection interval that you previously marked with 42s
4) go to that agent prompt and execute "inventory -x -e inv.dat"
5) then in a sep terminal execute "cat inv.dat | grep 42000 -B 1" to make sure this schedule has been disabled agent-side

Comment 3 Joseph Marques 2008-09-06 23:52:23 UTC
test 3 - single resource, update schedule, agent offline

1) repeat steps 1-3 from test 1...but change the time to, say, 31 seconds
2) turn the agent back on and wait 30-60 seconds for the first inventory report to be sent (you can confirm when this happens if you tail the server log)
3) repeat steps 4 & 5 from test 1

test 4 - single resource, disable schedule, agent offline

1) repeat steps 1-3 from test 2...but disable the one you just set to 31 seconds
2) turn the agent back on and wait 30-60 seconds for the first inventory report to be sent (you can confirm when this happens if you tail the server log)
3) repeat steps 4 & 5 from test 2

Comment 4 Joseph Marques 2008-09-06 23:52:34 UTC
test 5 - multi-resource, update schedule, agent online

1) go to admin > monitoring defaults > choose some resource type (and make sure you count the number of resources of that type in your inventory, we'll call this value X)
2) change some collection interval for a single schedule to something odd like 53s
3) go to that agent prompt and execute "inventory -x -e inv.dat"
4) then in a sep terminal execute "cat inv.dat | grep 53000 -c" and make sure the count is precisely X

test 6 - multi-resource, disable schedule, agent online

1) go to admin > monitoring defaults > choose some resource type (and make sure you count the number of resources of that type in your inventory, we'll call this value X)
2) disable the collection interval that you previously marked with 53s
3) go to that agent prompt and execute "inventory -x -e inv.dat"
4) then in a sep terminal execute "cat inv.dat | grep 53000 -B 1" and make sure that all X entries are disabled now

Comment 5 Joseph Marques 2008-09-06 23:58:02 UTC
test 7 - multi-resource, update schedule, agent offline

1) repeat steps 1 & 2 from test 5, but change the time to, say, 67 seconds
2) turn the agent back on and wait 30-60 seconds for the first inventory report to be sent (you can confirm when this happens if you tail the server log) 
3) repeat steps 3 & 4 from test 5

test 8 - multi-resource, disable schedule, agent offline

1) repeat steps 1 & 2 from test 6, but disable the one you just set to 67 seconds
2) turn the agent back on and wait 30-60 seconds for the first inventory report to be sent (you can confirm when this happens if you tail the server log) 
3) repeat steps 3 & 4 from test 6

Comment 6 Joseph Marques 2008-09-07 00:33:55 UTC
rev1375 - scale back logging on the sever-side for measurement schedule and metric template updates; 

Comment 7 Corey Welton 2008-09-19 18:04:57 UTC
QA Verified.

Comment 8 Red Hat Bugzilla 2009-11-10 21:17:00 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-792
This bug is related to RHQ-1996
This bug is related to RHQ-2246


Comment 9 wes hayutin 2010-02-16 21:09:30 UTC
Mass move to component = Monitoring

Comment 10 Joseph Marques 2010-07-01 05:20:47 UTC
This shouldn't be in assigned state.  It has been verified for nearly 9 months now.

Comment 11 Corey Welton 2010-07-01 12:58:41 UTC
QA Closing :)

Comment 12 Corey Welton 2010-08-12 16:51:43 UTC
Mass-closure of verified bugs against JON.


Note You need to log in before you can comment on or make changes to this bug.