Description of problem: The condor_configd was restarted every hour without any reason. Version-Release number of selected component (if applicable): How reproducible: Start condor and take a look at the /var/log/condor/MasterLog. Steps to Reproduce: 1. service condor start 2. tail -f /var/log/condor/MasterLog 3. wait couple of hours Actual results: condor_configd restarted every hour Expected results: no restart Additional info: 07/28 17:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 8215 07/28 17:20:50 The QMF_CONFIGD (pid 8215) exited with status 1 07/28 17:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 18:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 9017 07/28 18:20:50 The QMF_CONFIGD (pid 9017) exited with status 1 07/28 18:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 19:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 9783 07/28 19:20:50 The QMF_CONFIGD (pid 9783) exited with status 1 07/28 19:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 20:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 10549 07/28 20:20:50 The QMF_CONFIGD (pid 10549) exited with status 1 07/28 20:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 21:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 11315 07/28 21:20:50 The QMF_CONFIGD (pid 11315) exited with status 1 07/28 21:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 22:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 12081 07/28 22:20:50 The QMF_CONFIGD (pid 12081) exited with status 1 07/28 22:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/28 23:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 12847 07/28 23:20:50 The QMF_CONFIGD (pid 12847) exited with status 1 07/28 23:20:50 restarting /usr/sbin/condor_configd in 3600 seconds 07/29 00:20:50 Started process "/usr/sbin/condor_configd", pid and pgroup = 13613 07/29 00:20:50 The QMF_CONFIGD (pid 13613) exited with status 1 07/29 00:20:50 restarting /usr/sbin/condor_configd in 3600 seconds
Affected version: $CondorVersion: 7.4.4 Jul 23 2010 BuildID: RH-7.4.4-0.6.el5 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL5 $
The QMF_CONFIGD is failing and exiting with status 1. At each exit the condor_master is reporting it will restart the configd in 3600 seconds (1 hour), which it does. The configd fails and the cycle resets. http://www.cs.wisc.edu/condor/manual/v7.4/3_3Configuration.html#15663 MASTER_BACKOFF_CEILING defaults to 1 hour (3600 seconds). - If you look earlier in the file you'll see the configd starting more frequently.