Bug 471902 - condor_master abandons pidfile on -restart
condor_master abandons pidfile on -restart
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.0
All Linux
medium Severity medium
: 1.1
: ---
Assigned To: Matthew Farrellee
Kim van der Riet
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-17 11:28 EST by Matthew Farrellee
Modified: 2009-02-04 11:05 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-04 11:05:39 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-11-17 11:28:17 EST
Description of problem:

The master appears to forget about its pidfile when condor_restart -master'd


Version-Release number of selected component (if applicable):

7.2.0-0.1 and before


How reproducible:

Always


Steps to Reproduce:
1. run condor_master via init script
2. run condor_restart -master
3. notice that the pidfile remains but ps auxwww | grep condor_master no longer does
4. when master is shutdown the pidfile is not cleaned up
Comment 1 Dan Bradley 2008-11-20 17:25:45 EST
This has been fixed for Condor 7.2.0.  --Dan
Comment 2 Matthew Farrellee 2008-11-20 17:31:31 EST
Thanks Dan.

The fix will appear in 7.2.0-0.3
Comment 4 Pete MacKinnon 2008-12-11 13:12:42 EST
[root@north-13 ~]# ll /var/lib/condor/condor_master.pid 
ls: /var/lib/condor/condor_master.pid: No such file or directory
[root@north-13 ~]# service condor start
Starting Condor daemons:                                   [  OK  ]
[root@north-13 ~]# ll /var/lib/condor/condor_master.pid 
-rw-r--r-- 1 condor condor 6 Dec 11 13:10 /var/lib/condor/condor_master.pid
[root@north-13 ~]# condor_restart -master
Sent "Restart" command to local master
[root@north-13 ~]# ll /var/lib/condor/condor_master.pid 
-rw-r--r-- 1 condor condor 6 Dec 11 13:10 /var/lib/condor/condor_master.pid
[root@north-13 ~]# ps auxwww | grep condor_master
condor   13555  0.0  0.0 113596  7108 ?        Ssl  13:10   0:00 condor_master -pidfile /var/lib/condor/condor_master.pid
root     13806  0.0  0.0  61172   728 pts/1    S+   13:11   0:00 grep condor_master
[root@north-13 ~]# service condor stop
Stopping Condor daemons:                                   [  OK  ]
[root@north-13 ~]# ps auxwww | grep condor_master
root     13824  0.0  0.0  61172   728 pts/1    S+   13:12   0:00 grep condor_master
[root@north-13 ~]# ll /var/lib/condor/condor_master.pid 
ls: /var/lib/condor/condor_master.pid: No such file or directory
[root@north-13 ~]#
Comment 5 Pete MacKinnon 2008-12-11 13:39:26 EST
the pidfile abandoning issue comes about when the master restarts itself

the master monitors the executables, on disk, from DAEMON_LIST. when they are new it will re'exec the daemon. it even does this for itself. in the past it wouldn't keep its argv when re'execing, so it would not clean up. so, still more testing to do on 471902
Comment 6 Pete MacKinnon 2008-12-11 17:14:42 EST
Everything seems OK. PID file gets cleaned up after a self-restart followed by a shutdown sometime later. However the PID never changes since the master is restarted with execv.

"Like all of the exec functions, execv replaces the calling process image with a new process image. This has the effect of running a new progam with the process ID of the calling process. Note that a new process is not started; the new process image simply overlays the original process image. The execv function is most commonly used to overlay a process image that has been created by a call to the fork  function."

[root@north-13 condor]# cat condor_master.pid
17079
[root@north-13 condor]# !ps
ps -ef | grep condor_master
condor   17079     1  0 16:25 ?        00:00:00 condor_master -pidfile /var/lib/condor/condor_master.pid
[root@north-13 condor]# grep modified log/MasterLog 
12/11 16:31:00 /usr/sbin/condor_master was modified, restarting /usr/sbin/condor_master.
[root@north-13 condor]# !ps
ps -ef | grep condor_master
condor   17079     1  0 16:25 ?        00:00:00 condor_master -pidfile /var/lib/condor/condor_master.pid
[root@north-13 condor]# !cat
cat condor_master.pid
17079
[root@north-13 condor]# service condor stop
Stopping Condor daemons:                                   [  OK  ]
[root@north-13 condor]# ll
total 56
-rw-r--r-- 1 root   root    400 Nov 14 11:43 condor_config.local
-rw-r--r-- 1 root   root    106 Nov 24 18:07 condor_config.overrides
-rw-r--r-- 1 root   root     61 Nov 21 23:07 condor_config.overrides~
drwxr-xr-x 2 condor condor 4096 Dec 11 16:33 execute
drwxr-xr-x 2 root   root   4096 Dec  4 15:21 feature_configs
drwxr-xr-x 2 condor condor 4096 Dec 11 16:43 log
drwxr-xr-x 2 condor condor 4096 Dec 10 09:29 spool
Comment 8 errata-xmlrpc 2009-02-04 11:05:39 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html

Note You need to log in before you can comment on or make changes to this bug.