Bug 588383
Summary: | make sure server sleeps a given amount of time at startup to ensure agents know it was down | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
Component: | Core Server | Assignee: | John Mazzitelli <mazz> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Corey Welton <cwelton> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 1.3 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | 2.4 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-12 16:57:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 584435 |
Description
John Mazzitelli
2010-05-03 15:50:14 UTC
gwt branch - commit 28f0e62acee77b6f439ab2b114cdf9ee2fc767ae - we need to cherry pick that over to master. Note that the default is 70s, but its configurable by setting the system property "rhq.server.ensure-down-time-secs" - you can add this to rhq-server.properties. we need to confirm that where this sleep happens is in the proper place. for example, just because our comm layer isn't up, doesn't mean the tomcat connector isn't accepting connections. The tomcat connectors may be up and therefore the agents may still be able to connect, they just won't be able to successfully get their messages processed. If that is the case, this solution won't work (because IIRC the agent will only think the server is truly down if it receives a CannotConnect exception, and that would not be the exception it would have received since it did connect, its message just failed to be processed). BTW: this has been cherry picked to master - commit 30f79902c4ff5f0ef03a16c5c0319befc697bcce (In reply to comment #2) > for example, just because our comm layer isn't up, doesn't mean the tomcat > connector isn't accepting connections. The tomcat connectors may be up and > therefore the agents may still be able to connect, they just won't be able to > successfully get their messages processed. If that is the case, this solution > won't work (because IIRC the agent will only think the server is truly down if > it receives a CannotConnect exception, and that would not be the exception it > would have received since it did connect, its message just failed to be > processed). We should be OK. I just ran and test and looked at the code and a "failover-able" exception includes the exception you get if our comm layer isn't started but Tomcat connectors are - that being a WebServerError exception. See CommUtils: public static boolean isExceptionFailoverable(Throwable t) { return (t instanceof CannotConnectException || t instanceof ConnectException || t instanceof NotProcessedException || t instanceof WebServerError); } Therefore, I think this solution will be OK. Of course, need more QA testing to confirm. QA verified. Mass-closure of verified bugs against JON. |