Bug 546736
| Summary: | Schedd performs unnecessary file operations on SPOOL, targeting mpp.X.Y files | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
| Component: | condor | Assignee: | Matthew Farrellee <matt> |
| Status: | CLOSED ERRATA | QA Contact: | Luigi Toscano <ltoscano> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.2 | CC: | fnadge, ltoscano |
| Target Milestone: | 1.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-10-14 16:12:51 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Initial change for this it to have the Schedd set attribute MyProxyPasswordExists to TRUE whenever the MyProxyPassword attribute is encountered and written to the mpp file. The deletion of the mpp file is guarded by a test on MyProxyPasswordExists. Upgrade impact: Any existing job with an mpp file will have it left in the filesystem when the job leaves the queue. The mpp files are not protected from PREEN, so they will not be leaked forever. To avoid this problem, after upgrade condor_qedit can be run to set MyProxyPasswordExists on all jobs. It will result in extra attempts to stat the mpp file for jobs that do not have one, which is no worse than the current situation. It is not desirable to entirely avoid the upgrade issue by having the absence of MyProxyPasswordExists mean the file may exist. Doing so means that condor_submit (and all submitters) must set the attribute to avoid the unnecessary file operations on SPOOL. Upstream ticket: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1061 Fixed in 7.4.2-0.1 strace still shows calls to stat() when no password is specified. condor-7.4.3-0.21, RHEL4.8/5.5, i386/x86_64. UPSTREAM-7.5.1-BZ546736-spool-mpp-files was mislabeled as 7.4.1 and never made it into the build. Definitely included in 7.4.4-0.3. Access to mpp* files happens only when a password is specified. Verified on condor-7.4.4-0.4, RHEL4.8/5.5, i386/x86_64.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used, by providing -password to condor_submit or specifying +MyProxyPassword in a submit file.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1 +1 @@
-Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used, by providing -password to condor_submit or specifying +MyProxyPassword in a submit file.+Previously, the condor_scheduler daemon attempted to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job left the queue. With this update, traffic takes place only on the mpp file if it is specifically used.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |
All relevant versions, including 7.4.1-0.7 The condor_schedd will attempt to access a SPOOL/mpp.ClusterId.ProcId file for every job when the job leaves the queue, either by rm or successful completion. The rm example is given below. There should only be traffic on the mpp file if it is specifically used, e.g. by providing -password to condor_submit or specifying +MyProxyPassword in a submit file. Actual output: $ echo "cmd=/bin/sleep\nargs=1h\nnotification=never\nqueue 3" | condor_submit Submitting job(s)... 3 job(s) submitted to cluster 7. $ condor_rm 7 Cluster 7 has been marked for removal. $ echo "cmd=/bin/sleep\nargs=1h\nnotification=never\nqueue 3" | condor_submit -password downwithmpp Submitting job(s)... 3 job(s) submitted to cluster 8. $ condor_rm 8 Cluster 8 has been marked for removal. # strace -p $(pidof condor_schedd) 2>&1 | grep spool | grep mpp stat64("/var/lib/condor/spool/mpp.7.0", 0xbfbbd664) = -1 ENOENT (No such file or directory) stat64("/var/lib/condor/spool/mpp.7.1", 0xbfbbd664) = -1 ENOENT (No such file or directory) stat64("/var/lib/condor/spool/mpp.7.2", 0xbfbbd664) = -1 ENOENT (No such file or directory) stat64("/var/lib/condor/spool/mpp.8.0", 0xbfbbd2c4) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/mpp.8.0", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 14 stat64("/var/lib/condor/spool/mpp.8.2", 0xbfbbd664) = -1 ENOENT (No such file or directory) stat64("/var/lib/condor/spool/mpp.8.0", {st_mode=S_IFREG|0600, st_size=11, ...}) = 0 unlink("/var/lib/condor/spool/mpp.8.0") = 0 Expected output: $ echo "cmd=/bin/sleep\nargs=1h\nshould_transfer_files=no\nnotification=never\nqueue 3" | condor_submit Submitting job(s)... 3 job(s) submitted to cluster 9. $ condor_rm 9 Cluster 9 has been marked for removal. $ echo "cmd=/bin/sleep\nargs=1h\nshould_transfer_files=no\nnotification=never\nqueue 3" | condor_submit -password downwithmpp Submitting job(s)... 3 job(s) submitted to cluster 10. $ condor_rm 10 Cluster 10 has been marked for removal. # strace -p $(pidof condor_schedd) 2>&1 | grep spool | grep mpp stat64("/var/lib/condor/spool/mpp.10.0", 0xbff8a494) = -1 ENOENT (No such file or directory) open("/var/lib/condor/spool/mpp.10.0", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 14 stat64("/var/lib/condor/spool/mpp.10.0", {st_mode=S_IFREG|0600, st_size=11, ...}) = 0 unlink("/var/lib/condor/spool/mpp.10.0") = 0