Bug 807398
| Summary: | Endpoint updating for HA configurations | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Robert Rati <rrati> |
| Component: | condor-aviary | Assignee: | Pete MacKinnon <pmackinn> |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | Development | CC: | ltoscano, matt, mkudlej, pmackinn, trusnak, tstclair |
| Target Milestone: | 2.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | condor-7.8.2-0.1 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Aviary Locator behaviour when the Aviary Schedd plug-in and Query Server are deployed in a HA group.
Consequence: Aviary clients would have experienced stale endpoint references for a longer duration than necessary.
Fix: Adjustments were made in the Locator implementation to quickly replace a failed endpoint reference with its new one.
Result: An Aviary client using a Schedd or Query Server endpoint will now be able to retrieve the new endpoint faster.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-03-06 18:43:16 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 871080 | ||
| Bug Blocks: | |||
|
Description
Robert Rati
2012-03-27 16:58:20 UTC
How can I test this, please? Verification advice: 2.2 HA process groups will include the Schedd and the Query Server (QS) which can be configured to publish their SOAP endpoints using the new location feature. 1) Do these endpoints correctly re-locate in a failover scenario as evidenced by a SOAP tool such as locator.py found in /usr/share/condor/aviary? 2) Are there multiple redundant entries for a particular endpoint when there should only be one? 3) Is the endpoint listed after failover actually reachable, or is it a stale reference to the old location (host:port)? 4) Are old references from crashed (e.g., kill -9) process endpoints removed or replaced in a timely manner? Within (AVIARY_LOCATOR_MISSED_UPDATES+1) * AVIARY_LOCATOR_PRUNE_INTERVAL seconds?
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Aviary Locator behaviour when the Aviary Schedd plug-in and Query Server are deployed in a HA group.
Consequence: Aviary clients would have experienced stale endpoint references for a longer duration than necessary.
Fix: Adjustments were made in the Locator implementation to quickly replace a failed endpoint reference with its new one.
Result: An Aviary client using a Schedd or Query Server endpoint will now be able to retrieve the new endpoint faster.
# ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node2:45039/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query # clusvcadm -r "HA Schedd HASchedd1" -m node1 Trying to relocate service:HA Schedd HASchedd1 to node1...Success service:HA Schedd HASchedd1 is now running on node1 # ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:50697/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/ # ps ax | grep -i aviary 25232 ? S< 0:00 aviary_query_server -pidfile /var/run/condor/aviary_query_server-HASchedd1_query_server.pid -local-name HASchedd1_query_server # kill -9 25232 # ./locator.py --type=ANY -s -r=/etc/condor/certs/ca.crt -k=/etc/condor/certs/client.key -c=/etc/condor/certs/client.crt CUSTOM | QUERY_SERVER | ha-schedd-HASchedd1@ | http://node1:47936/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd2@ | http://node2:37425/services/query/ CUSTOM | QUERY_SERVER | ha-schedd-HASchedd3@ | http://node2:48322/services/query/ Locator endpoint updated after service relocation and/or crashed (killed) service. >>> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html |