Bug 810302

Summary: have agent config setting to test or not to test failover list on startup
Product: [Other] RHQ Project Reporter: John Mazzitelli <mazz>
Component: AgentAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: hrupp
Target Milestone: ---   
Target Release: RHQ 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=810124
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-01 06:18:44 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description John Mazzitelli 2012-04-05 10:59:25 EDT
We need to add a new agent configuration property:

rhq.agent.test-failover-list-at-startup

by default it is true (since that is what we do today).

This will skip the call to AgentMan.testFailoverList if its false.

This could help speed up agent startup.
Comment 1 John Mazzitelli 2012-04-05 11:15:04 EDT
we need to also add some debug log messages to the testFailoverList in case that is slowing things down, we'll at least see the connection attempts causing problems
Comment 2 John Mazzitelli 2012-04-05 14:33:00 EDT
if the new setting is set to true (which is the default) the agent log will now show one log message per failover list entry that looks like this:

2012-04-05 14:27:37,624 DEBUG [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.test-failover-list-entry}Testing failover connectivity to server [myserver:7080]

If the connectivity test fails, you will see a error messages (this has always been the case, I just added the debug message that I document above).

If the new setting it set to false, the testing of failover entries is skipped and you will instead see this message in the log file:

   "Testing connectivity to servers found in the failover list has been DISABLED and will be skipped."

NOTE: if all servers in the failover list fail the connectivity check, you will see this error message dumped on the agent console window (this has always been the case, I'm just documenting it here):

"!!! There are [{0}] servers that are potentially unreachable by this agent.
Please double check all public endpoints of your servers and ensure
they are all reachable by this agent. The failed server endpoints are:
{the failover list is shown here}
See the Administration (Topology) > Servers in the server GUI
to change the public endpoint of a server.
THIS AGENT WILL WAIT UNTIL ONE OF ITS SERVERS BECOMES REACHABLE!"
Comment 3 John Mazzitelli 2012-04-05 14:48:50 EDT
git commit to master a8f774500b02d7adbf59a0ed4995d8cdd8a4b9c6
Comment 4 John Mazzitelli 2012-05-03 10:26:05 EDT
will make the default "false" to avoid any potential problems. leaving it open for the user to flip this to true so they can do some debugging if need be
Comment 5 John Mazzitelli 2012-05-03 12:20:16 EDT
(In reply to comment #4)
> will make the default "false" to avoid any potential problems. leaving it open
> for the user to flip this to true so they can do some debugging if need be

git commit master : 3c3b4e0

to test this feature, you now have to set to "true" the config preference - "rhq.agent.test-failover-list-at-startup"
Comment 6 John Mazzitelli 2012-08-03 10:29:49 EDT
fyi: a8f774500b02d7adbf59a0ed4995d8cdd8a4b9c6 and 3c3b4e0 where cherry picked already to release/jon3.1.0 branch
Comment 7 Heiko W. Rupp 2013-09-01 06:18:44 EDT
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.