Hide Forgot
Created attachment 545715 [details] configuration of cluster nodes Description of problem: There are 2(one live and one dead) scheduler in the pool after killing of node with running scheduler. Version-Release number of selected component (if applicable): condor-7.6.5-0.8.el6 How reproducible: 100% Steps to Reproduce: 1. setup HA scheduler with RH HA so scheduler runs on one node of cluster 2. do this for node with scheduler: virsh destroy(for cluster with nodes in KVM virtualization) or iptables --flush iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP for cluster with physical machines In this case is this run for _202_ node. 3. cluster moves service to another node 4. check how many schedulers are in pool by $ condor_status -schedd -l | grep -i machine Machine = "_202_" <- this machine has benn destroyed and should not been there Machine = "_205_" <- this is current running scheduler $ condor_q -name ha_schedd@ -- Schedd: ha_schedd@ : <_205_:52784> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held <waiting for timeout of query on dead _202_ node> ... -- Failed to fetch ads from: <_202_:54520> : _202_ CEDAR:6001:Failed to connect to <_202_:54520> Actual results: There are dead scheduler in pool in condor_status outputs and condor_q also work with it. Expected results: There is just one live scheduler in pool. Additional info: This testcase works for configuration of HA scheduler without usage of RH HA.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.