Bug 703992 - unnecessary spool interaction during job removal/completion
Summary: unnecessary spool interaction during job removal/completion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: 2.1
: ---
Assignee: Will Benton
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2011-05-11 19:51 UTC by Matthew Farrellee
Modified: 2012-01-23 17:26 UTC (History)
4 users (show)

Fixed In Version: condor-7.6.4-0.8
Doc Type: Bug Fix
Doc Text:
Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:26:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0045 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC

Description Matthew Farrellee 2011-05-11 19:51:36 UTC
$ condor_version
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $

Submitting: echo 'cmd=/bin/true\nqueue' | condor_submit

Watching schedd: strace -e trace=file -p $(pidof condor_schedd)

The job is not using a spool directory, yet the strace logs -

stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0xd46a70) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0xd55840) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0xd3a230) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0")      = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13
unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1")        = -1 ENOENT (No such file or directory)

The stat() attempts are unfortunate. The attempts to rmdir() and unlink() when the targets are known to not exist are bad.

Comment 1 Will Benton 2011-10-14 22:06:26 UTC
Upstream on 28a0d32b

Comment 2 Will Benton 2011-10-15 03:29:11 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C:  The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6.
C:  Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems.
F:  The schedd now interacts with the filesystem minimally in cases where no job spool directories exist.
R:  As a result, filesystem overhead should be minimal if job spool directories are not enabled.

Comment 4 Lubos Trilety 2011-11-01 10:18:14 UTC
Successfully reproduced on:
$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

# echo -e 'cmd=/bin/true\nqueue' | runuser condor_user -s /bin/bash -c condor_submit
Submitting job(s).
1 job(s) submitted to cluster 1.
[root@lt-rhel5_64-old ~]# strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool"
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0644) = 13
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.tmp") = -1 ENOENT (No such file or directory)
stat("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap", 0x62eb510) = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0.swap") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1/0")      = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND, 0644) = 13
unlink("/var/lib/condor/spool/1/cluster1.ickpt.subproc0") = -1 ENOENT (No such file or directory)
rmdir("/var/lib/condor/spool/1")        = -1 ENOENT (No such file or directory)

Comment 5 Lubos Trilety 2011-11-01 10:23:56 UTC
Tested on:
$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $
$CondorPlatform: I686-RedHat_5.7 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.7 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $
$CondorPlatform: I686-RedHat_6.1 $

$CondorVersion: 7.6.5 Oct 28 2011 BuildID: RH-7.6.5-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $


# strace -e trace=file -p $(pidof condor_schedd) 2>&1 | grep "/var/lib/condor/spool"
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 21
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_TRUNC|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/.schedd_classad.new", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0644) = 23
rename("/var/lib/condor/spool/.schedd_classad.new", "/var/lib/condor/spool/.schedd_classad") = 0
stat64("/var/lib/condor/spool/1/0/cluster1.proc0.subproc0", 0x869e5e8) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_APPEND|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/history", O_RDWR|O_CREAT|O_EXCL|O_APPEND|O_LARGEFILE, 0644) = 22
stat64("/var/lib/condor/spool/1", 0x869e5e8) = -1 ENOENT (No such file or directory)

no rmdir() or unlink() on spool directory

>>> VERIFIED

Comment 6 Douglas Silas 2011-11-17 13:08:53 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-C:  The way that the Condor schedd handles cleaning up job spool directories changed between version 7.4 and 7.6.
+Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.-C:  Even if job spool directories are not enabled, the cleanup code in the schedd could have a significant performance impact in certain environments, including those using shared filesystems.
-F:  The schedd now interacts with the filesystem minimally in cases where no job spool directories exist.
-R:  As a result, filesystem overhead should be minimal if job spool directories are not enabled.

Comment 7 errata-xmlrpc 2012-01-23 17:26:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.