Bug 461986 - condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.
condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.0
All Linux
medium Severity medium
: 1.0
: ---
Assigned To: grid-maint-list
Kim van der Riet
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-11 14:22 EDT by William Henry
Modified: 2010-10-26 14:18 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-10-27 17:28:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description William Henry 2008-09-11 14:22:55 EDT
Description of problem:

A qpidd process started as a job with:
Executable     = /usr/sbin/qpidd 
Universe       = vanilla
arguments      = -t --auth no --data-dir /home/whenry/.qpidd
Log            = qpidd.log
Output         = output.log

Queue
 
Using condor_hold or condor_rm to shutdown the qpidd job often leaves behind the lock file in the data-dir. This means removing the lock explicitly from the command line.  A graceful shutdown was witnessed once using condor_hold and condor_rm combination and resubmitting but not always - in fact a graceful shutdown only witnessed once.

An explicit "kill" on the process does 'cause a graceful shutdown.

Version-Release number of selected component (if applicable):

How reproducible:

Use the job file above. 
condor_submit my_qpid_job
Run condor_q a few times until you see that the job is running.
Using the job ID run either condor_rm or the combination condor_hold and condor_rm.
Resubmit the job. Watch as the job starts to run and then drops off the running queue. In fact you don't even have to resubmit because you can run "ls" on the data-dir and see the lock file. The qpidd will not run while that lock file is still there. 


Steps to Reproduce:
See above
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Ted Ross 2008-11-11 12:26:04 EST
William,

The lock file in the data-dir is not deleted on a clean qpidd shutdown.  The fact that the file is still there is not in itself a problem.  Furthermore, killing the  qpidd process (even with kill -9) properly cleans up the lock.

Did you ever see the following message?

  Cannot lock <data-dir>/lock: Resource temporarily unavailable

This is the only indication you will get that there is a lock contention problem, and it only happens when two qpidd processes are vying for the same data directory.

-Ted
Comment 2 William Henry 2008-11-12 18:46:22 EST
I need to retest this and see what happens. (it seems so long ago now).
Comment 3 William Henry 2008-11-25 10:19:14 EST
This is not critical for 1.1. as I don't know of any customer that is actually running brokers as a job. So I've pushed to 1.1.1. I'll retest then.

Note You need to log in before you can comment on or make changes to this bug.