Created attachment 601716 [details] condor logs Description of problem: If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool. Version-Release number of selected component (if applicable): condor-7.6.5-0.19.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure pool $ wallaby add-group allnodes Adding the following group: allnodes Console Connection Established... #Repeat following command for all machines $ wallaby add-node-memberships <node> allnodes Console Connection Established... $ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>' Console Connection Established... $ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager Console Connection Established... $ wallaby activate Console Connection Established... 2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)" 3. check condor daemons Actual results: only negotiator stop, HAD is still running on the master node (<node1>) Expected results: both HAD and negotiator should stop Additional info: After restart of the condor on all machines it works correctly After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment
I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem.