Bug 844971

Summary:

Central Manager High Availability - the negotiator doesn't run on any node

Product:

Red Hat Enterprise MRG

Reporter:

Lubos Trilety <ltrilety>

Component:

condor

Assignee:

Robert Rati <rrati>

Status:

CLOSED WORKSFORME

QA Contact:

MRG Quality Engineering <mrgqe-bugs>

Severity:

high

Docs Contact:

Priority:

medium

Version:

2.2

CC:

iboverma, matt, mkudlej, rrati, tstclair

Target Milestone:

2.3

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-10-10 15:00:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
condor logs	none

Description Lubos Trilety 2012-08-01 11:37:07 UTC

Created attachment 601716 [details]
condor logs

Description of problem:
If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.19.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure pool
$ wallaby add-group allnodes
Adding the following group: allnodes
Console Connection Established...

#Repeat following command for all machines
$ wallaby add-node-memberships <node> allnodes
Console Connection Established...

$ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>'
Console Connection Established...

$ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager
Console Connection Established...

$ wallaby activate
Console Connection Established...

2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)"

3. check condor daemons
  
Actual results:
only negotiator stop, HAD is still running on the master node (<node1>)

Expected results:
both HAD and negotiator should stop

Additional info:
After restart of the condor on all machines it works correctly
After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment

Comment 1 Robert Rati 2012-10-10 15:00:23 UTC

I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem.