Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 766629

Summary:

RH HA + HA Scheduler - 2 schedulers in pool after node with scheduler died

Product:

Red Hat Enterprise MRG

Reporter:

Martin Kudlej <mkudlej>

Component:

condor

Assignee:

grid-maint-list <grid-maint-list>

Status:

CLOSED WONTFIX

QA Contact:

MRG Quality Engineering <mrgqe-bugs>

Severity:

medium

Docs Contact:

Priority:

low

Version:

2.1

CC:

matt, trusnak, tstclair

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-05-26 20:23:33 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
configuration of cluster nodes	none

Description Martin Kudlej 2011-12-12 13:42:31 UTC

Created attachment 545715 [details]
configuration of cluster nodes

Description of problem:
There are 2(one live and one dead) scheduler in the pool after killing of node with running scheduler.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.8.el6

How reproducible:
100%

Steps to Reproduce:
1. setup HA scheduler with RH HA so scheduler runs on one node of cluster

2. do this for node with scheduler: virsh destroy(for cluster with nodes in KVM virtualization) or
iptables --flush  
iptables -P INPUT DROP  
iptables -P FORWARD DROP  
iptables -P OUTPUT DROP 
for cluster with physical machines
In this case is this run for _202_ node.

3. cluster moves service to another node

4. check how many schedulers are in pool by 

$ condor_status -schedd -l | grep -i machine
Machine = "_202_" <- this machine has benn destroyed and should not been there
Machine = "_205_" <- this is current running scheduler

$ condor_q -name ha_schedd@


-- Schedd: ha_schedd@ : <_205_:52784>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held
<waiting for timeout of query on dead _202_ node>
...
-- Failed to fetch ads from: <_202_:54520> : _202_
CEDAR:6001:Failed to connect to <_202_:54520>
  
Actual results:
There are dead scheduler in pool in condor_status outputs and condor_q also work with it.

Expected results:
There is just one live scheduler in pool.

Additional info: This testcase works for configuration of HA scheduler without usage of RH HA.

Comment 1 Anne-Louise Tangring 2016-05-26 20:23:33 UTC

MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.