Bug 534286 - (RHQ-1098) make availability report interval longer
make availability report interval longer
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
All All
medium Severity medium (vote)
: ---
: ---
Assigned To: John Mazzitelli
: SubTask
Depends On:
Blocks: RHQ-1092 741450
  Show dependency treegraph
Reported: 2008-11-10 12:00 EST by John Mazzitelli
Modified: 2011-09-26 16:47 EDT (History)
1 user (show)

See Also:
Fixed In Version: 1.3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2008-11-10 12:00:00 EST
Currently, the agent sends its availability reports every 60 seconds and the server expects to hear from the agent within 2 minutes.

I think we want to lengthen these times to something like 90 seconds and 4 minutes.  Note the 90 seconds (on agent side) is configurable and we should be able to configure that 4 minutes on server side.

This change will a) cause less traffic to hit the server (in fact, we reduce the number of avail reports to be processed by 50%) and b) we only backfill agents when they have been silent for 4 minutes giving the agent more time to be able to get an avail report processed on the server side.  Backfilling is expensive if the agent is UP so we only want to backfill when we are sure the agent is down.

Perhaps before we backfill, we should have the server try to ping the agent and if the ping succeeds, we shouldn't backfill.  Just another test we could do to avoid backfilling when possible.
Comment 1 John Mazzitelli 2008-11-11 12:26:15 EST
we need to investigate what is the proper interval should be.
Comment 2 John Mazzitelli 2008-11-12 02:21:11 EST
An alternative is to perform some additional checking after 2 minutes of quiet time but before we actually backfill.

Perhaps we can look in our DB for ANY activity from the agent right before we backfill. If we've seen we already processed (within the past 2 minutes) an inventory report, a measurement report, an operation result, a configuration change or other agent-originating message, we can assume the agent is up and just hasn't been able to send us its avail report yet. In this case, we abort the backfill.

So its:

1) checkSuspectAgents looks for an avail report that occurred within the past 2 minutes. If nothing then:
2) check to see if the agent has sent us any message in the previous 2m interval (like inventory report, measurement report, operation result, etc). If we DID get such a message from the agent, abort and do not backfill. Otherwise:
3) continue with the normal backfill processing

So step 2) would be new. 
Comment 3 John Mazzitelli 2009-01-20 11:18:04 EST
thoughts re: making additional queries before backfilling to see if we heard from the agent. we don't want to make more queries - kinda defeats the purpose of wanting to reduce the amount of load on the database that we are trying to do.

We could just change the queit time - this would mean the agent still sends avail reports every 1 minute (thus we still are able to alert within a minute of when resources go down) but the quiet period increase allows us to delay the backfill giving us more time to process avail reports if we need it.  This would delay alerting but only in the case when the agent as a whole goes down.

We could increase the backfill quiet time default to 3 or 4 minutes.
Comment 4 Joseph Marques 2009-02-04 17:10:56 EST
let's wait to see what information falls out of charles' testing.  this might be pushed to future.
Comment 5 John Mazzitelli 2009-06-08 09:30:15 EDT
we decided against changing the interval. i'm sure we'll revisit this in the future :) but for now, we will leave this as is.
Comment 6 John Mazzitelli 2009-08-12 17:38:15 EDT
as expected, we are revisiting.

I think we will make the defaults as 5 minutes for agent avail reporting and 15 minutes for quiet period.
Comment 7 John Mazzitelli 2009-08-12 18:04:57 EDT
agent avail reporting is every 5 minutes.
server allows a quiet time of 15 minutes.
Comment 8 Red Hat Bugzilla 2009-11-10 15:23:53 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1098

Note You need to log in before you can comment on or make changes to this bug.