Bug 918205
Summary: | Agent appears UNKNOWN in inventory but still active; possibly caused by drop of availability report | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> |
Component: | Agent | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED DUPLICATE | QA Contact: | Mike Foley <mfoley> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | hrupp, loleary |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-07-18 18:39:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1094540 | ||
Bug Blocks: |
Description
Elias Ross
2013-03-05 17:30:30 UTC
What reproduces this fairly reliably is if the database goes down for a time period (10-15 minutes) and comes back. The server will reconnect to the database and recover but platforms appear down. You do see metrics being inserted still. I've also seen this in 4.9, to a lesser degree. About 10% of the servers after a fairly long network outage (30 minutes) still appeared down, even though they were clearly sending traffic to the server and functioning fine. The fix as described above still works. I failed in an initial attempt to reproduce this using master (4.10+). But I tried only with one agent. It came up immediately after the server reconnected to the database. I'd be surprised to see an avail report get dropped. I don't think that is the issue. It may be more along the lines of a full avail report not getting requested on re-connect, or something like that. Since the agent has been running it's avail checks may not have resulted in any changed avail. There has been a bunch of sync work don in 4.10. Wondering if this is still seen in 4.10. As suggested by https://bugzilla.redhat.com/show_bug.cgi?id=1094540#c17 this issue appears to have been resolved in downstream and was committed to master in: https://github.com/rhq-project/rhq/commit/94008542694eef157289e6f9884669480021b565 and https://github.com/rhq-project/rhq/commit/dcc27a2c1f1acbf9fb818c92eb27ce278ef6db99 @Jay, do you concur? Larry, +1. I think this is resolved. This is resolved in Bug 1094540, marking as duplicate. *** This bug has been marked as a duplicate of bug 1094540 *** |