Description of problem: Providing High-Availability through Red Hat HA for the Query Server requires endpoint updating for a Query Server to be located after a failover. Without this, an aviary client would likely lose track of which machine is running the Query Server once the daemon is failed over to another machine in the cluster. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
How can I test this, please?
Verification advice: 2.2 HA process groups will include the Schedd and the Query Server (QS) which can be configured to publish their SOAP endpoints using the new location feature. 1) Do these endpoints correctly re-locate in a failover scenario as evidenced by a SOAP tool such as locator.py found in /usr/share/condor/aviary? 2) Are there multiple redundant entries for a particular endpoint when there should only be one? 3) Is the endpoint listed after failover actually reachable, or is it a stale reference to the old location (host:port)? 4) Are old references from crashed (e.g., kill -9) process endpoints removed or replaced in a timely manner? Within (AVIARY_LOCATOR_MISSED_UPDATES+1) * AVIARY_LOCATOR_PRUNE_INTERVAL seconds?
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Aviary Locator behaviour when the Aviary Schedd plug-in and Query Server are deployed in a HA group. Consequence: Aviary clients would have experienced stale endpoint references for a longer duration than necessary. Fix: Adjustments were made in the Locator implementation to quickly replace a failed endpoint reference with its new one. Result: An Aviary client using a Schedd or Query Server endpoint will now be able to retrieve the new endpoint faster.
# ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node2:45039/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query # clusvcadm -r "HA Schedd HASchedd1" -m node1 Trying to relocate service:HA Schedd HASchedd1 to node1...Success service:HA Schedd HASchedd1 is now running on node1 # ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:50697/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/ # ps ax | grep -i aviary 25232 ? S< 0:00 aviary_query_server -pidfile /var/run/condor/aviary_query_server-HASchedd1_query_server.pid -local-name HASchedd1_query_server # kill -9 25232 # ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:47936/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/ Locator endpoint updated after service relocation and/or crashed (killed) service. >>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html