Bug 844971 - Central Manager High Availability - the negotiator doesn't run on any node
Central Manager High Availability - the negotiator doesn't run on any node
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
2.2
All Linux
medium Severity high
: 2.3
: ---
Assigned To: Robert Rati
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-01 07:37 EDT by Lubos Trilety
Modified: 2013-01-04 10:39 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-10 11:00:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
condor logs (18.06 KB, application/x-gzip)
2012-08-01 07:37 EDT, Lubos Trilety
no flags Details

  None (edit)
Description Lubos Trilety 2012-08-01 07:37:07 EDT
Created attachment 601716 [details]
condor logs

Description of problem:
If the high availability of central manager was configured using wallaby and the negotiator was stopped using condor_off command after it had started, the HAD didn't stop which led to situation when no negotiator has been running in pool.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.19.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure pool
$ wallaby add-group allnodes
Adding the following group: allnodes
Console Connection Established...

#Repeat following command for all machines
$ wallaby add-node-memberships <node> allnodes
Console Connection Established...

$ wallaby add-params-to-group allnodes ALLOW_WRITE=* ALLOW_READ=* ALL_DEBUG=D_FULLDEBUG REPLICATION_LIST='<node1>:$(REPLICATION_PORT),<node2>:$(REPLICATION_PORT)' HAD_LIST='<node1>:$(HAD_PORT),<node2>:$(HAD_PORT)' HAD_UPDATE_INTERVAL=30 REPLICATION_INTERVAL=30 CONDOR_HOST='<node1>, <node2>'
Console Connection Established...

$ wallaby add-features-to-group allnodes NodeAccess Master HACentralManager
Console Connection Established...

$ wallaby activate
Console Connection Established...

2. wait until negotiator is running and then run "condor_off -subsystem negotiator (-fast)"

3. check condor daemons
  
Actual results:
only negotiator stop, HAD is still running on the master node (<node1>)

Expected results:
both HAD and negotiator should stop

Additional info:
After restart of the condor on all machines it works correctly
After some checking I found out, that it happens only on rhel6 64bit machine, there are logs from that machine in attachment
Comment 1 Robert Rati 2012-10-10 11:00:23 EDT
I've been unable to reproduce this issue with 3 and 4 node pools, including the original node that saw the problem.

Note You need to log in before you can comment on or make changes to this bug.