Red Hat Bugzilla – Bug 461986
condor_hold and condor_rm seem to shutdown qpidd ungracefully .. sometimes.
Last modified: 2010-10-26 14:18:52 EDT
Description of problem:
A qpidd process started as a job with:
Executable = /usr/sbin/qpidd
Universe = vanilla
arguments = -t --auth no --data-dir /home/whenry/.qpidd
Log = qpidd.log
Output = output.log
Using condor_hold or condor_rm to shutdown the qpidd job often leaves behind the lock file in the data-dir. This means removing the lock explicitly from the command line. A graceful shutdown was witnessed once using condor_hold and condor_rm combination and resubmitting but not always - in fact a graceful shutdown only witnessed once.
An explicit "kill" on the process does 'cause a graceful shutdown.
Version-Release number of selected component (if applicable):
Use the job file above.
Run condor_q a few times until you see that the job is running.
Using the job ID run either condor_rm or the combination condor_hold and condor_rm.
Resubmit the job. Watch as the job starts to run and then drops off the running queue. In fact you don't even have to resubmit because you can run "ls" on the data-dir and see the lock file. The qpidd will not run while that lock file is still there.
Steps to Reproduce:
The lock file in the data-dir is not deleted on a clean qpidd shutdown. The fact that the file is still there is not in itself a problem. Furthermore, killing the qpidd process (even with kill -9) properly cleans up the lock.
Did you ever see the following message?
Cannot lock <data-dir>/lock: Resource temporarily unavailable
This is the only indication you will get that there is a lock contention problem, and it only happens when two qpidd processes are vying for the same data directory.
I need to retest this and see what happens. (it seems so long ago now).
This is not critical for 1.1. as I don't know of any customer that is actually running brokers as a job. So I've pushed to 1.1.1. I'll retest then.