Bug 471902
Summary: | condor_master abandons pidfile on -restart | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
Component: | grid | Assignee: | Matthew Farrellee <matt> |
Status: | CLOSED ERRATA | QA Contact: | Kim van der Riet <kim.vdriet> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.0 | CC: | dan, pmackinn |
Target Milestone: | 1.1 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-02-04 16:05:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matthew Farrellee
2008-11-17 16:28:17 UTC
This has been fixed for Condor 7.2.0. --Dan Thanks Dan. The fix will appear in 7.2.0-0.3 [root@north-13 ~]# ll /var/lib/condor/condor_master.pid ls: /var/lib/condor/condor_master.pid: No such file or directory [root@north-13 ~]# service condor start Starting Condor daemons: [ OK ] [root@north-13 ~]# ll /var/lib/condor/condor_master.pid -rw-r--r-- 1 condor condor 6 Dec 11 13:10 /var/lib/condor/condor_master.pid [root@north-13 ~]# condor_restart -master Sent "Restart" command to local master [root@north-13 ~]# ll /var/lib/condor/condor_master.pid -rw-r--r-- 1 condor condor 6 Dec 11 13:10 /var/lib/condor/condor_master.pid [root@north-13 ~]# ps auxwww | grep condor_master condor 13555 0.0 0.0 113596 7108 ? Ssl 13:10 0:00 condor_master -pidfile /var/lib/condor/condor_master.pid root 13806 0.0 0.0 61172 728 pts/1 S+ 13:11 0:00 grep condor_master [root@north-13 ~]# service condor stop Stopping Condor daemons: [ OK ] [root@north-13 ~]# ps auxwww | grep condor_master root 13824 0.0 0.0 61172 728 pts/1 S+ 13:12 0:00 grep condor_master [root@north-13 ~]# ll /var/lib/condor/condor_master.pid ls: /var/lib/condor/condor_master.pid: No such file or directory [root@north-13 ~]# the pidfile abandoning issue comes about when the master restarts itself the master monitors the executables, on disk, from DAEMON_LIST. when they are new it will re'exec the daemon. it even does this for itself. in the past it wouldn't keep its argv when re'execing, so it would not clean up. so, still more testing to do on 471902 Everything seems OK. PID file gets cleaned up after a self-restart followed by a shutdown sometime later. However the PID never changes since the master is restarted with execv. "Like all of the exec functions, execv replaces the calling process image with a new process image. This has the effect of running a new progam with the process ID of the calling process. Note that a new process is not started; the new process image simply overlays the original process image. The execv function is most commonly used to overlay a process image that has been created by a call to the fork function." [root@north-13 condor]# cat condor_master.pid 17079 [root@north-13 condor]# !ps ps -ef | grep condor_master condor 17079 1 0 16:25 ? 00:00:00 condor_master -pidfile /var/lib/condor/condor_master.pid [root@north-13 condor]# grep modified log/MasterLog 12/11 16:31:00 /usr/sbin/condor_master was modified, restarting /usr/sbin/condor_master. [root@north-13 condor]# !ps ps -ef | grep condor_master condor 17079 1 0 16:25 ? 00:00:00 condor_master -pidfile /var/lib/condor/condor_master.pid [root@north-13 condor]# !cat cat condor_master.pid 17079 [root@north-13 condor]# service condor stop Stopping Condor daemons: [ OK ] [root@north-13 condor]# ll total 56 -rw-r--r-- 1 root root 400 Nov 14 11:43 condor_config.local -rw-r--r-- 1 root root 106 Nov 24 18:07 condor_config.overrides -rw-r--r-- 1 root root 61 Nov 21 23:07 condor_config.overrides~ drwxr-xr-x 2 condor condor 4096 Dec 11 16:33 execute drwxr-xr-x 2 root root 4096 Dec 4 15:21 feature_configs drwxr-xr-x 2 condor condor 4096 Dec 11 16:43 log drwxr-xr-x 2 condor condor 4096 Dec 10 09:29 spool An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html |