Bug 587051 - HA schedd system needs fencing
Summary: HA schedd system needs fencing
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: 2.1
: ---
Assignee: Robert Rati
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-28 18:35 UTC by Scott Spurrier
Modified: 2018-11-14 19:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-05 14:36:00 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Scott Spurrier 2010-04-28 18:35:07 UTC
Description of problem:

The high availability condor_schedd system relies upon a condor_master process updating a timestamp on a file in a shared filesystem.  If the delta between current time and that timestamp exceeds some threshold, a secondary condor_master fires up additional condor_schedd processes.

There is nothing in place to fence / kill the original set of condor_schedd processes.

Consider the following test:
1.  Schedd HA node #1 running with $(SPOOL) on NFS hard mount.
2.  NFS server hangs for time delta greater than secondary condor_master's configured tolerance.
3.  Secondary condor_master starts up duplicate set of condor_schedd's.
4.  NFS server quits hanging.
5.  There is neither a means to stop the original condor_schedd processes at this point, nor a means to stop the secondary condor_schedd processes.  They are both concurrently (over)writing $(SPOOL)/job_queue.log at this point.

Something needs to be added to fence the condor_schedd processes on the origin node or somesuch.

How reproducible:

100%.  An event similar to the one above happened during our preventative maintenance period yesterday.

Steps to Reproduce:

See above.

Actual results:

See above.

Expected results:

Only one condor_schedd process running per configured $(SPOOL).

Comment 1 Matthew Farrellee 2010-08-30 00:27:24 UTC
Locating the lock on the same mount as the job_queue.log should avoid any overwriting.

Theoretically (needs verification), hard mount semantics may not prevent a race between Masters.

 t0 - MasterOne obtains lock
 t1 - MasterTwo fails to obtain lock
 t2 - NFS server fails
 t3 - MasterTwo tries to obtain lock, MasterOne tries to update lock, both block
 t4 - NFS server returns, after lock has expired
 t5 - MasterTwo unblocks and obtains lock
 t6 - MasterOne updates lock

At t6 both think they own the lock. Introduction of an identifier that allows MasterOne to notice it lost the lock will improve the situation. However, even then, between t5 and t6 there may be multiple copies of the managed daemon running. An acked fence event from MasterTwo to MasterOne before MasterTwo starts the daemon would address this issue.

Aside, fencing is not desirable in the case where multiple managed daemons exist on a single node. An NFS failure for one daemon would trigger the fencing of all daemons.

Comment 3 Robert Rati 2011-07-05 14:44:34 UTC
Using Red Hat Cluster Suite to manage the HA Schedd will address this issue


Note You need to log in before you can comment on or make changes to this bug.