461986 – condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.

Bug 461986 - condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.

Summary: condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	condor
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	1.0
Target Release:	---
Assignee:	grid-maint-list
QA Contact:	Kim van der Riet
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-09-11 18:22 UTC by William Henry
Modified:	2010-10-26 18:18 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-10-27 21:28:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description William Henry 2008-09-11 18:22:55 UTC

Description of problem:

A qpidd process started as a job with:
Executable     = /usr/sbin/qpidd 
Universe       = vanilla
arguments      = -t --auth no --data-dir /home/whenry/.qpidd
Log            = qpidd.log
Output         = output.log

Queue
 
Using condor_hold or condor_rm to shutdown the qpidd job often leaves behind the lock file in the data-dir. This means removing the lock explicitly from the command line.  A graceful shutdown was witnessed once using condor_hold and condor_rm combination and resubmitting but not always - in fact a graceful shutdown only witnessed once.

An explicit "kill" on the process does 'cause a graceful shutdown.

Version-Release number of selected component (if applicable):

How reproducible:

Use the job file above. 
condor_submit my_qpid_job
Run condor_q a few times until you see that the job is running.
Using the job ID run either condor_rm or the combination condor_hold and condor_rm.
Resubmit the job. Watch as the job starts to run and then drops off the running queue. In fact you don't even have to resubmit because you can run "ls" on the data-dir and see the lock file. The qpidd will not run while that lock file is still there. 


Steps to Reproduce:
See above
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ted Ross 2008-11-11 17:26:04 UTC

William,

The lock file in the data-dir is not deleted on a clean qpidd shutdown.  The fact that the file is still there is not in itself a problem.  Furthermore, killing the  qpidd process (even with kill -9) properly cleans up the lock.

Did you ever see the following message?

  Cannot lock <data-dir>/lock: Resource temporarily unavailable

This is the only indication you will get that there is a lock contention problem, and it only happens when two qpidd processes are vying for the same data directory.

-Ted

Comment 2 William Henry 2008-11-12 23:46:22 UTC

I need to retest this and see what happens. (it seems so long ago now).

Comment 3 William Henry 2008-11-25 15:19:14 UTC

This is not critical for 1.1. as I don't know of any customer that is actually running brokers as a job. So I've pushed to 1.1.1. I'll retest then.

Note You need to log in before you can comment on or make changes to this bug.