844971 – Central Manager High Availability - the negotiator doesn't run on any node

Bug 844971 - Central Manager High Availability - the negotiator doesn't run on any node

Summary: Central Manager High Availability - the negotiator doesn't run on any node

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	condor
Sub Component:
Version:	2.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	2.3
Target Release:	---
Assignee:	Robert Rati
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-01 11:37 UTC by Lubos Trilety
Modified:	2013-01-04 15:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-10-10 15:00:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
condor logs (18.06 KB, application/x-gzip) 2012-08-01 11:37 UTC, Lubos Trilety	no flags	Details
View All

Description Lubos Trilety 2012-08-01 11:37:07 UTC

Created attachment 601716 [details]
condor logs

Description of problem:
If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.19.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure pool
$ wallaby add-group allnodes
Adding the following group: allnodes
Console Connection Established...

#Repeat following command for all machines
$ wallaby add-node-memberships <node> allnodes
Console Connection Established...

$ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>'
Console Connection Established...

$ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager
Console Connection Established...

$ wallaby activate
Console Connection Established...

2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)"

3. check condor daemons
  
Actual results:
only negotiator stop, HAD is still running on the master node (<node1>)

Expected results:
both HAD and negotiator should stop

Additional info:
After restart of the condor on all machines it works correctly
After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment

Comment 1 Robert Rati 2012-10-10 15:00:23 UTC

I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem.

Note You need to log in before you can comment on or make changes to this bug.