Bug 844971
Summary: | Central Manager High Availability - the negotiator doesn't run on any node | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Lubos Trilety <ltrilety> | ||||
Component: | condor | Assignee: | Robert Rati <rrati> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.2 | CC: | iboverma, matt, mkudlej, rrati, tstclair | ||||
Target Milestone: | 2.3 | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-10-10 15:00:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem. |
Created attachment 601716 [details] condor logs Description of problem: If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool. Version-Release number of selected component (if applicable): condor-7.6.5-0.19.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure pool $ wallaby add-group allnodes Adding the following group: allnodes Console Connection Established... #Repeat following command for all machines $ wallaby add-node-memberships <node> allnodes Console Connection Established... $ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>' Console Connection Established... $ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager Console Connection Established... $ wallaby activate Console Connection Established... 2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)" 3. check condor daemons Actual results: only negotiator stop, HAD is still running on the master node (<node1>) Expected results: both HAD and negotiator should stop Additional info: After restart of the condor on all machines it works correctly After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment