Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 807398

Summary: Endpoint updating for HA configurations
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: condor-aviaryAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: medium Docs Contact:
Priority: high    
Version: DevelopmentCC: ltoscano, matt, mkudlej, pmackinn, trusnak, tstclair
Target Milestone: 2.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.8.2-0.1 Doc Type: Bug Fix
Doc Text:
Cause: Aviary Locator behaviour when the Aviary Schedd plug-in and Query Server are deployed in a HA group. Consequence: Aviary clients would have experienced stale endpoint references for a longer duration than necessary. Fix: Adjustments were made in the Locator implementation to quickly replace a failed endpoint reference with its new one. Result: An Aviary client using a Schedd or Query Server endpoint will now be able to retrieve the new endpoint faster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 18:43:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 871080    
Bug Blocks:    

Description Robert Rati 2012-03-27 16:58:20 UTC
Description of problem:
Providing High-Availability through Red Hat HA for the Query Server requires endpoint updating for a Query Server to be located after a failover.   Without this, an aviary client would likely lose track of which machine is running the Query Server once the daemon is failed over to another machine in the cluster.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Martin Kudlej 2012-04-04 08:38:20 UTC
How can I test this, please?

Comment 6 Pete MacKinnon 2012-04-26 12:49:32 UTC
Verification advice:

2.2 HA process groups will include the Schedd and the Query Server (QS) which can be configured to publish their SOAP endpoints using the new location feature.

1) Do these endpoints correctly re-locate in a failover scenario as evidenced by a SOAP tool such as locator.py found in /usr/share/condor/aviary?

2) Are there multiple redundant entries for a particular endpoint when there should only be one?

3) Is the endpoint listed after failover actually reachable, or is it a stale reference to the old location (host:port)?

4) Are old references from crashed (e.g., kill -9) process endpoints removed or replaced in a timely manner? Within (AVIARY_LOCATOR_MISSED_UPDATES+1) * AVIARY_LOCATOR_PRUNE_INTERVAL seconds?

Comment 7 Pete MacKinnon 2012-05-02 19:00:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Aviary Locator behaviour when the Aviary Schedd plug-in and Query Server are deployed in a HA group.
Consequence: Aviary clients would have experienced stale endpoint references for a longer duration than necessary.
Fix: Adjustments were made in the Locator implementation to quickly replace a failed endpoint reference with its new one.
Result: An Aviary client using a Schedd or Query Server endpoint will now be able to retrieve the new endpoint faster.

Comment 10 Tomas Rusnak 2013-01-09 10:58:34 UTC
# ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt 
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node2:45039/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query

# clusvcadm -r "HA Schedd HASchedd1" -m node1
Trying to relocate service:HA Schedd HASchedd1 to node1...Success
service:HA Schedd HASchedd1 is now running on node1

# ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt 
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:50697/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/


# ps ax | grep -i aviary
25232 ?        S<     0:00 aviary_query_server -pidfile /var/run/condor/aviary_query_server-HASchedd1_query_server.pid -local-name HASchedd1_query_server

# kill -9 25232

# ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt 
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:47936/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/
CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/

Locator endpoint updated after service relocation and/or crashed (killed) service.

>>> VERIFIED

Comment 12 errata-xmlrpc 2013-03-06 18:43:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html