Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 546736

Summary: Schedd performs unnecessary file operations on SPOOL, targeting mpp.X.Y files
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Luigi Toscano <ltoscano>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.2CC: fnadge, ltoscano
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 16:12:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2009-12-11 20:11:01 UTC
All relevant versions, including 7.4.1-0.7

The condor_schedd will attempt to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job leaves the queue, either by rm or successful completion. The rm example is given below. There should only be traffic on the mpp file if it is specifically used, e.g. by providing -password to condor_submit or specifying +MyProxyPassword in a submit file.

Actual output:

$ echo "cmd=/bin/sleep\nargs=1h\nnotification=never\nqueue 3" | condor_submit
Submitting job(s)...
3 job(s) submitted to cluster 7.
$ condor_rm 7
Cluster 7 has been marked for removal.
$ echo "cmd=/bin/sleep\nargs=1h\nnotification=never\nqueue 3" | condor_submit -password downwithmpp
Submitting job(s)...
3 job(s) submitted to cluster 8.
$ condor_rm 8
Cluster 8 has been marked for removal.

# strace -p $(pidof condor_schedd) 2>&1 | grep spool | grep mpp
stat64("/var/lib/condor/spool/mpp.7.0", 0xbfbbd664) = -1 ENOENT (No such file or directory)
stat64("/var/lib/condor/spool/mpp.7.1", 0xbfbbd664) = -1 ENOENT (No such file or directory)
stat64("/var/lib/condor/spool/mpp.7.2", 0xbfbbd664) = -1 ENOENT (No such file or directory)
stat64("/var/lib/condor/spool/mpp.8.0", 0xbfbbd2c4) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/mpp.8.0", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 14
stat64("/var/lib/condor/spool/mpp.8.2", 0xbfbbd664) = -1 ENOENT (No such file or directory)
stat64("/var/lib/condor/spool/mpp.8.0", {st_mode=S_IFREG|0600, st_size=11, ...}) = 0
unlink("/var/lib/condor/spool/mpp.8.0") = 0


Expected output:

$ echo "cmd=/bin/sleep\nargs=1h\nshould_transfer_files=no\nnotification=never\nqueue 3" | condor_submit
Submitting job(s)...
3 job(s) submitted to cluster 9.
$ condor_rm 9
Cluster 9 has been marked for removal.
$ echo "cmd=/bin/sleep\nargs=1h\nshould_transfer_files=no\nnotification=never\nqueue 3" | condor_submit -password downwithmpp
Submitting job(s)...
3 job(s) submitted to cluster 10.
$ condor_rm 10
Cluster 10 has been marked for removal.

# strace -p $(pidof condor_schedd) 2>&1 | grep spool | grep mpp
stat64("/var/lib/condor/spool/mpp.10.0", 0xbff8a494) = -1 ENOENT (No such file or directory)
open("/var/lib/condor/spool/mpp.10.0", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 14
stat64("/var/lib/condor/spool/mpp.10.0", {st_mode=S_IFREG|0600, st_size=11, ...}) = 0
unlink("/var/lib/condor/spool/mpp.10.0") = 0

Comment 1 Matthew Farrellee 2009-12-15 20:55:56 UTC
Initial change for this it to have the Schedd set attribute MyProxyPasswordExists to TRUE whenever the MyProxyPassword attribute is encountered and written to the mpp file. The deletion of the mpp file is guarded by a test on MyProxyPasswordExists.

Upgrade impact: Any existing job with an mpp file will have it left in the filesystem when the job leaves the queue. The mpp files are not protected from PREEN, so they will not be leaked forever. To avoid this problem, after upgrade condor_qedit can be run to set MyProxyPasswordExists on all jobs. It will result in extra attempts to stat the mpp file for jobs that do not have one, which is no worse than the current situation.

Comment 2 Matthew Farrellee 2009-12-15 20:58:12 UTC
It is not desirable to entirely avoid the upgrade issue by having the absence of MyProxyPasswordExists mean the file may exist. Doing so means that condor_submit (and all submitters) must set the attribute to avoid the unnecessary file operations on SPOOL.

Comment 3 Matthew Farrellee 2009-12-15 21:00:45 UTC
Upstream ticket: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1061

Comment 4 Matthew Farrellee 2010-01-04 18:19:01 UTC
Fixed in 7.4.2-0.1

Comment 5 Luigi Toscano 2010-06-25 17:52:30 UTC
strace still shows calls to stat() when no password is specified.
condor-7.4.3-0.21, RHEL4.8/5.5, i386/x86_64.

Comment 6 Matthew Farrellee 2010-06-28 13:26:43 UTC
UPSTREAM-7.5.1-BZ546736-spool-mpp-files was mislabeled as 7.4.1 and never made it into the build. Definitely included in 7.4.4-0.3.

Comment 7 Luigi Toscano 2010-07-27 17:31:04 UTC
Access to mpp* files happens only when a password is specified.

Verified on condor-7.4.4-0.4, RHEL4.8/5.5, i386/x86_64.

Comment 8 Florian Nadge 2010-10-07 17:44:13 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used, by providing -password to condor_submit or specifying +MyProxyPassword in a submit file.

Comment 9 Florian Nadge 2010-10-07 17:44:35 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used, by providing -password to condor_submit or specifying +MyProxyPassword in a submit file.+Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used.

Comment 11 errata-xmlrpc 2010-10-14 16:12:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html