Bug 766629 - RH HA + HA Scheduler - 2 schedulers in pool after node with scheduler died
Summary: RH HA + HA Scheduler - 2 schedulers in pool after node with scheduler died
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.1
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: grid-maint-list
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-12 13:42 UTC by Martin Kudlej
Modified: 2016-05-26 20:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-26 20:23:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
configuration of cluster nodes (1.55 KB, application/octet-stream)
2011-12-12 13:42 UTC, Martin Kudlej
no flags Details

Description Martin Kudlej 2011-12-12 13:42:31 UTC
Created attachment 545715 [details]
configuration of cluster nodes

Description of problem:
There are 2(one live and one dead) scheduler in the pool after killing of node with running scheduler.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.8.el6

How reproducible:
100%

Steps to Reproduce:
1. setup HA scheduler with RH HA so scheduler runs on one node of cluster

2. do this for node with scheduler: virsh destroy(for cluster with nodes in KVM virtualization) or
iptables --flush  
iptables -P INPUT DROP  
iptables -P FORWARD DROP  
iptables -P OUTPUT DROP 
for cluster with physical machines
In this case is this run for _202_ node.

3. cluster moves service to another node

4. check how many schedulers are in pool by 

$ condor_status -schedd -l | grep -i machine
Machine = "_202_" <- this machine has benn destroyed and should not been there
Machine = "_205_" <- this is current running scheduler

$ condor_q -name ha_schedd@


-- Schedd: ha_schedd@ : <_205_:52784>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held
<waiting for timeout of query on dead _202_ node>
...
-- Failed to fetch ads from: <_202_:54520> : _202_
CEDAR:6001:Failed to connect to <_202_:54520>
  
Actual results:
There are dead scheduler in pool in condor_status outputs and condor_q also work with it.

Expected results:
There is just one live scheduler in pool.

Additional info: This testcase works for configuration of HA scheduler without usage of RH HA.

Comment 1 Anne-Louise Tangring 2016-05-26 20:23:33 UTC
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.


Note You need to log in before you can comment on or make changes to this bug.