$ condor_version $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ Submitting: echo 'cmd=/bin/true\nqueue' | condor_submit Watching schedd: strace -e trace=file -p $(pidof condor_schedd) The job is not using a spool directory, yet the strace logs - stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0xd46a70) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory) stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0xd55840) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory) stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0xd3a230) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0") = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13 unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1") = -1 ENOENT (No such file or directory) The stat() attempts are unfortunate. The attempts to rmdir() and unlink() when the targets are known to not exist are bad.
Upstream on 28a0d32b
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6. C: Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems. F: The schedd now interacts with the filesystem minimally in cases where no job spool directories exist. R: As a result, filesystem overhead should be minimal if job spool directories are not enabled.
Successfully reproduced on: $CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ # echo -e 'cmd=/bin/true\nqueue' | runuser condor_user -s /bin/bash -c condor_submit Submitting job(s). 1 job(s) submitted to cluster 1. [root@lt-rhel5_64-old ~]# strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool" open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0644) = 13 rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0 stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x62eb510) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory) stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0x62eb510) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory) stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0x62eb510) = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1/0") = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13 unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory) rmdir("/var/lib/condor/spool/1") = -1 ENOENT (No such file or directory)
Tested on: $CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $ $CondorPlatform: I686-RedHat_5.7 $ $CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $ $CondorPlatform: X86_64-RedHat_5.7 $ $CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $ $CondorPlatform: I686-RedHat_6.1 $ $CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $ $CondorPlatform: X86_64-RedHat_6.1 $ # strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool" open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 21 rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0 open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 23 rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0 stat64("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x869e5e8) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_APPEND|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND|O_LARGEFILE, 0644) = 22 stat64("/var/lib/condor/spool/1", 0x869e5e8) = -1 ENOENT (No such file or directory) no rmdir() or unlink() on spool directory >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1 @@ -C: The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6. +Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.-C: Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems. -F: The schedd now interacts with the filesystem minimally in cases where no job spool directories exist. -R: As a result, filesystem overhead should be minimal if job spool directories are not enabled.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html