Bug 711456 - Condor fails to start up properly (cannot write its pid file in /var/run/...)
Summary: Condor fails to start up properly (cannot write its pid file in /var/run/...)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: condor
Version: 15
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Matthew Farrellee
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-07 14:43 UTC by Bert DeKnuydt
Modified: 2011-12-08 13:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-08 13:04:28 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Bert DeKnuydt 2011-06-07 14:43:56 UTC
Description of problem:

Condor does not start up properly and sends nagging mails 'obituaries'.

Version-Release number of selected component (if applicable):

condor-7.5.5-2.fc15

How reproducible:

Always

Steps to Reproduce:
1. yum install condor
2. service condor start
3. reboot
  
Actual results:

Nag mails, and errors in /var/log/condor/Masterlog (and others) similar to:

06/07/11 15:52:36 DaemonCore: ERROR: Can't open pid file /var/run/condor/condor_master.pid

Expected results:

No nagging.

Additional info:

Reason is that condor fails to create the directory /var/run/condor.  It used
to install it through its rpm, but now /var/run is on tmpfs, so its content  disappears after every boot.

Suggested fix:

Add a file condor.conf in /etc/tmpfiles.d with contents:

--
# Condor needs directory in /var/run
d /var/run/condor 0775 condor condor
--

Comment 1 Matthew Farrellee 2011-06-07 15:26:58 UTC
Brian reports he has a patch for this (and systemd).

Comment 2 Matthew Farrellee 2011-06-07 17:13:24 UTC
Confirmed on -

$CondorVersion: 7.7.0 Jun 07 2011 PRE-RELEASE-UWCS $
$CondorPlatform: X86_64-Fedora_15 $

Only present after a system reboot.

Currently seeing,

MasterLog -

06/07/11 13:01:02 DaemonCore: ERROR: Can't open pid file /var/run/condor/condor_master.pid

This is not a fatal issue, just an indication of a problem.

The fatal issue is in the SchedLog -

06/07/11 13:02:20 (pid:1160) error opening watchdog pipe /var/run/condor/procd_pipe.SCHEDD.watchdog: No such file or directory (2)
06/07/11 13:02:20 (pid:1160) ProcFamilyClient: error initializing LocalClient
06/07/11 13:02:20 (pid:1160) ProcFamilyProxy: error initializing ProcFamilyClient
06/07/11 13:02:20 (pid:1160) ERROR "ProcD has failed" at line 587 in file /root/rpmbuild/BUILD/condor-7.6.0/src/condor_utils/proc_family_proxy.cpp

--

# rpm -qV condor
.......M.    /var/lib/condor/execute
missing     /var/run/condor

-

Immediate workaround for missing /var/run/condor is to mkdir+chown condor.condor.

Modification to execute is from drwxrwxrwt to drwxr-xr-x and does not appear to hinder execution of jobs.

Comment 3 Matthew Farrellee 2011-06-07 17:22:07 UTC
Notes -
 0) execute dir perms need validation
 1) /tmp/condorLocks is being used instead of /var/lock/condor/local

Comment 4 Matthew Farrellee 2011-06-07 17:23:01 UTC
Confirmed tmpfiles.d fix,

# ls -al /etc/tmpfiles.d/condor.conf 
-rw-r--r--. 1 root root 74 Jun  7 13:15 /etc/tmpfiles.d/condor.conf

# cat /etc/tmpfiles.d/condor.conf 
# Condor needs directory in /var/run
d /var/run/condor 0775 condor condor

Comment 5 Fedora Update System 2011-06-07 20:52:17 UTC
condor-7.7.0-0.3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/condor-7.7.0-0.3.fc15

Comment 6 Bert DeKnuydt 2011-06-08 09:18:27 UTC
condor-7.7.0-0.3.fc15 does NOT solve the problem, as the file /etc/tmpfiles.d/condor.conf is not what I suggested or what is in comment #4, but a kind of local install/test script. 

Also, I think it should not carry the file /var/lib/condor/condor_config.local
as that file is replaced by the /etc/condor/config.d/* entries.

Comment 7 Matthew Farrellee 2011-06-08 12:35:22 UTC
Typo in SOURCE2 (looked like SOURCE1).

I'll also remove condor_config.local.

Comment 8 Brian Bockelman 2011-06-08 13:00:22 UTC
About condor_config.local: note that if it doesn't exist, then you need to throw a switch in the condor_config:

REQUIRE_LOCAL_CONFIG_FILE = FALSE

Otherwise it will choke on startup.

Comment 9 Matthew Farrellee 2011-06-08 13:09:54 UTC
LOCAL_CONFIG_FILE is not defined, eliminating the need for REQUIRE_LOCAL_CONFIG_FILE=FALSE

Comment 10 Fedora Update System 2011-06-08 13:42:19 UTC
condor-7.7.0-0.4.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/condor-7.7.0-0.4.fc15

Comment 11 Fedora Update System 2011-06-10 13:32:37 UTC
Package condor-7.7.0-0.4.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing condor-7.7.0-0.4.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/condor-7.7.0-0.4.fc15
then log in and leave karma (feedback).

Comment 12 Fedora Update System 2011-06-14 19:22:02 UTC
condor-7.7.0-0.5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/condor-7.7.0-0.5.fc15


Note You need to log in before you can comment on or make changes to this bug.