Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 549389 - condor_master -pidfile will stomp pidfile of running master
condor_master -pidfile will stomp pidfile of running master
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.0
All Linux
low Severity medium
: 1.3
: ---
Assigned To: Matthew Farrellee
Luigi Toscano
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-21 09:58 EST by Matthew Farrellee
Modified: 2010-10-14 11:57 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When running "condor_master -pidfile /tmp/master.pid" twice in a row, "/tmp/master.pid" would contain the PID of the second condor_master, the one that exited immediately because it failed to get the 'InstanceLock'. With this update, "/tmp/master.pid" contains the PID of the first, still running condor_master.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 11:57:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 11:56:44 EDT

  None (edit)
Description Matthew Farrellee 2009-12-21 09:58:14 EST
If you run "condor_master -pidfile /tmp/master.pid" twice in a row, /tmp/master.pid will contain the PID of the second condor_master, the one that exited immediately because because it failed to get the InstanceLock.

    To reproduce:

       1. Install and configure Condor so it can run.
       2. condor_master -pidfile /tmp/master.pid
       3. cat /tmp/master.pid - Note the PID.
       4. condor_master -pidfile /tmp/master.pid
       5. cat /tmp/master.pid - Note the PID.
       6. Check MasterLog. Note that the second instance exited immediately, but that its PID matches the PID from step 5. 

    Observed behavior: /tmp/master.pid contains the PID of the second, exited condor_master.

    Expected behavior: /tmp/master.pid contains the PID of the first, still running condor_master.

Remarks:

    2009-May-26 13:55:21 by adesmet:
    #494 is a duplicate of this ticket. Contents duplicated below:

    Condor with --pidfile will write its pid before checking Instance lock

    Condor with --pidfile will write its pid before checking Instance lock This ends up writing over the pidfile created by the original condor instance. This will lead to init scripts trying to kill the wrong pid when trying to shutdown condor.

    I believe Condor should not write the pidfile until it knows that it is the one true instance.

    2009-Dec-21 08:33:00 by matt:
    To reproduce...

    $ _CONDOR_LOG=$PWD _CONDOR_MASTER_INSTANCE_LOCK=$PWD/InstanceLock condor_master -pidfile $PWD/PidFile
    $ cat PidFile
    11061
    $ _CONDOR_LOG=$PWD _CONDOR_MASTER_INSTANCE_LOCK=$PWD/InstanceLock ./condor_master -pidfile $PWD/PidFile -t -f
    ...
    12/21 09:26:59 ** PID = 11088
    ...
    12/21 09:26:59 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
    12/21 09:26:59 ERROR "Can't get lock on "/home/matt/Documents/Condor/CONDOR_SRC/src/condor_master.V6/InstanceLock"" at line 955 in file master.cpp
    $ cat PidFile
    11088
    $ ps 11061 11088
      PID TTY      STAT   TIME COMMAND
    11061 ?        Ss     0:00 ./condor_master -pidfile ...

    2009-Dec-21 08:41:48 by matt:
    Desired output...

    $ _CONDOR_LOG=$PWD _CONDOR_MASTER_INSTANCE_LOCK=$PWD/InstanceLock ./condor_master -pidfile $PWD/PidFile
    $ cat PidFile
    12311
    $ _CONDOR_LOG=$PWD _CONDOR_MASTER_INSTANCE_LOCK=$PWD/InstanceLock ./condor_master -pidfile $PWD/PidFile -t -f
    ...
    12/21 09:40:54 ** PID = 12338
    ...
    12/21 09:40:54 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
    12/21 09:40:54 ERROR "Can't get lock on "/home/matt/Documents/Condor/CONDOR_SRC/src/condor_master.V6/InstanceLock"" at line 955 in file master.cpp
    $ cat PidFile
    12311

    2009-Dec-21 08:55:09 by matt:
    FYI, a problem I'm not fixing is TRUNC_MASTER_LOG_ON_OPEN=TRUE will trash the MASTER_LOG before the master gets a chance to check the INSTANCE_LOCK.
Comment 1 Matthew Farrellee 2009-12-21 09:59:11 EST
This is an issue through at least 7.4.1-0.7.1
Comment 2 Matthew Farrellee 2010-01-04 13:24:50 EST
Fixed in 7.4.2-0.1
Comment 3 Luigi Toscano 2010-06-01 13:41:54 EDT
The new instance of condor_master does not overwrite the pidfile anymore if condor_master is already running. 

Verified on condor 7.4.3-0.16, RHEL 4.8/5.5, i386/x86_64.
Comment 4 Martin Prpič 2010-10-07 11:16:41 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When running "condor_master -pidfile /tmp/master.pid" twice in a row, "/tmp/master.pid" would contain the PID of the second condor_master, the one that exited immediately because it failed to get the 'InstanceLock'. With this update, "/tmp/master.pid" contains the PID of the first, still running condor_master.
Comment 6 errata-xmlrpc 2010-10-14 11:57:41 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.