Bug 703992

Summary: unnecessary spool interaction during job removal/completion
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: Will Benton <willb>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: low Docs Contact:
Priority: low    
Version: 2.0CC: ltrilety, matt, mkudlej, tstclair
Target Milestone: 2.1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: condor-7.6.4-0.8 Doc Type: Bug Fix
Doc Text:
Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:26:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 743350    

Description Matthew Farrellee 2011-05-11 19:51:36 UTC
$ condor_version
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $

Submitting: echo 'cmd=/bin/true\nqueue' | condor_submit

Watching schedd: strace -e trace=file -p $(pidof condor_schedd)

The job is not using a spool directory, yet the strace logs -

stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0xd46a70) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0xd55840) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0xd3a230) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0")      = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13
unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1")        = -1 ENOENT (No such file or directory)

The stat() attempts are unfortunate. The attempts to rmdir() and unlink() when the targets are known to not exist are bad.

Comment 1 Will Benton 2011-10-14 22:06:26 UTC
Upstream on 28a0d32b

Comment 2 Will Benton 2011-10-15 03:29:11 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C:  The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6.
C:  Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems.
F:  The schedd now interacts with the filesystem minimally in cases where no job spool directories exist.
R:  As a result, filesystem overhead should be minimal if job spool directories are not enabled.

Comment 4 Lubos Trilety 2011-11-01 10:18:14 UTC
Successfully reproduced on:
$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

# echo -e 'cmd=/bin/true\nqueue' | runuser condor_user -s /bin/bash -c condor_submit
Submitting job(s).
1 job(s) submitted to cluster 1.
[root@lt-rhel5_64-old ~]# strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool"
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0644) = 13
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0")      = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13
unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1")        = -1 ENOENT (No such file or directory)

Comment 5 Lubos Trilety 2011-11-01 10:23:56 UTC
Tested on:
$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $
$CondorPlatform: I686-RedHat_5.7 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.7 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $
$CondorPlatform: I686-RedHat_6.1 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $


# strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool"
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 21
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 23
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
stat64("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x869e5e8) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND|O_LARGEFILE, 0644) = 22
stat64("/var/lib/condor/spool/1", 0x869e5e8) = -1 ENOENT (No such file or directory)

no rmdir() or unlink() on spool directory

>>> VERIFIED

Comment 6 Douglas Silas 2011-11-17 13:08:53 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-C:  The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6.
+Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.-C:  Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems.
-F:  The schedd now interacts with the filesystem minimally in cases where no job spool directories exist.
-R:  As a result, filesystem overhead should be minimal if job spool directories are not enabled.

Comment 7 errata-xmlrpc 2012-01-23 17:26:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html