Bug 844971 - Central Manager High Availability - the negotiator doesn't run on any node
Summary: Central Manager High Availability - the negotiator doesn't run on any node
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.2
Hardware: All
OS: Linux
medium
high
Target Milestone: 2.3
: ---
Assignee: Robert Rati
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-01 11:37 UTC by Lubos Trilety
Modified: 2013-01-04 15:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-10 15:00:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
condor logs (18.06 KB, application/x-gzip)
2012-08-01 11:37 UTC, Lubos Trilety
no flags Details

Description Lubos Trilety 2012-08-01 11:37:07 UTC
Created attachment 601716 [details]
condor logs

Description of problem:
If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.19.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure pool
$ wallaby add-group allnodes
Adding the following group: allnodes
Console Connection Established...

#Repeat following command for all machines
$ wallaby add-node-memberships <node> allnodes
Console Connection Established...

$ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>'
Console Connection Established...

$ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager
Console Connection Established...

$ wallaby activate
Console Connection Established...

2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)"

3. check condor daemons
  
Actual results:
only negotiator stop, HAD is still running on the master node (<node1>)

Expected results:
both HAD and negotiator should stop

Additional info:
After restart of the condor on all machines it works correctly
After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment

Comment 1 Robert Rati 2012-10-10 15:00:23 UTC
I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem.


Note You need to log in before you can comment on or make changes to this bug.