Bug 534286 (RHQ-1098)
Summary: | make availability report interval longer | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
Component: | Agent | Assignee: | John Mazzitelli <mazz> |
Status: | CLOSED NEXTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | unspecified | CC: | jshaughn |
Target Milestone: | --- | Keywords: | SubTask |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-1098 | ||
Whiteboard: | |||
Fixed In Version: | 1.3 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 534281, 741450 |
Description
John Mazzitelli
2008-11-10 17:00:00 UTC
we need to investigate what is the proper interval should be. An alternative is to perform some additional checking after 2 minutes of quiet time but before we actually backfill. Perhaps we can look in our DB for ANY activity from the agent right before we backfill. If we've seen we already processed (within the past 2 minutes) an inventory report, a measurement report, an operation result, a configuration change or other agent-originating message, we can assume the agent is up and just hasn't been able to send us its avail report yet. In this case, we abort the backfill. So its: 1) checkSuspectAgents looks for an avail report that occurred within the past 2 minutes. If nothing then: 2) check to see if the agent has sent us any message in the previous 2m interval (like inventory report, measurement report, operation result, etc). If we DID get such a message from the agent, abort and do not backfill. Otherwise: 3) continue with the normal backfill processing So step 2) would be new. thoughts re: making additional queries before backfilling to see if we heard from the agent. we don't want to make more queries - kinda defeats the purpose of wanting to reduce the amount of load on the database that we are trying to do. We could just change the queit time - this would mean the agent still sends avail reports every 1 minute (thus we still are able to alert within a minute of when resources go down) but the quiet period increase allows us to delay the backfill giving us more time to process avail reports if we need it. This would delay alerting but only in the case when the agent as a whole goes down. We could increase the backfill quiet time default to 3 or 4 minutes. let's wait to see what information falls out of charles' testing. this might be pushed to future. we decided against changing the interval. i'm sure we'll revisit this in the future :) but for now, we will leave this as is. as expected, we are revisiting. I think we will make the defaults as 5 minutes for agent avail reporting and 15 minutes for quiet period. agent avail reporting is every 5 minutes. server allows a quiet time of 15 minutes. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1098 |