Bug 534721 (RHQ-1490)
Summary: | Availability computation is wrong when rhq server is down and agent is spooling | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Heiko W. Rupp <hrupp> |
Component: | No Component | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 1.2 | CC: | cwelton, jshaughn |
Target Milestone: | --- | Keywords: | SubBug |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-1490 | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
rev 2940
|
|
Last Closed: | 2013-09-01 19:20:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 565628, 741450 |
Description
Heiko W. Rupp
2009-02-06 13:39:00 UTC
availability reporting is not guaranteed to be delivered - therefore it is never spooled. From DiscoveryServerService: // GH: Disabled temporarily (JBNADM-2385) @Asynchronous( guaranteedDelivery = true ) @LimitedConcurrency(CONCURRENCY_LIMIT_AVAILABILITY_REPORT) boolean mergeAvailabilityReport(AvailabilityReport availabilityReport); This was the description of the issue and why avail reporting is not guaranteed/spooled: "Slow processing of measurement reports causes blocks to the availability report handling. This allows the backfiller to come along and mark everything down even though the agent knows everything is fine. The change to one asynch sending thread for agent comm's appears to have been the local cause to the problem though we'd still likely hit it at a slightly larger scale even with more threads sending (plus that caused other problems). For now, we will try sending the avail reports synchronously (and not reliably)." This is plain wrong, as the customer will see no metrics for the resource, but all lights are green - he will just be confused. We did on purpose make availability a first class citizen in RHQ. Writing a batch of availability reports to database (in batch even) should not be more expensive than doing the same for metrics - which we did not disable. If it's about alerting on past un-availability, then we'd need to disable alerting when we see that spooled data is coming for the timeframe from [start of spooling, now]. But we need at least show the data to the user -- he might need that for SLA computations or such. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1490 Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs. keyword: new = Tracking + FutureFeature + SubBug making sure we're not missing any bugs in rhq_triage I'm not sure but I think the fact that agent avail is no longer tied to avail reporting, or other changes made and descibed here [1], may take care of this issue. Asking Heiko to review and see if this can be closed. [1]http://rhq-project.org/display/RHQ/Design-Availability+Checking I think the changes mentioned in the wiki document will address this issue. This is in master and can likely be closed, testing is somewhat implicit. Bulk closing of BZs that have no target version set, but which are ON_QA for more than a year and thus are in production for a long time. |