Bug 549432
| Summary: | Parallel Universe jobs require job spool directory | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Jon Thomas <jthomas> | ||||||
| Component: | condor | Assignee: | Will Benton <willb> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 1.2 | CC: | iboverma, matt, mkudlej, pmackinn | ||||||
| Target Milestone: | 1.3 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Previously, /usr/libexec/condor/sshd.sh's use of the condor_chirp daemon assumed the previous creation of a spool temp directory, which was not the desired behavior. With this update, 'sshd.sh' no longer requires a spool directory.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2010-10-14 16:15:34 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 537232 | ||||||||
| Attachments: |
|
||||||||
|
Description
Jon Thomas
2009-12-21 17:11:42 UTC
Created attachment 379657 [details]
stderr output
*** Bug 538436 has been marked as a duplicate of this bug. *** Build in 7.4.3-0.21 How could I reproduce this bug? May you paste here test case or recipe to reproduce this issue, please? Event posted on 08-02-2010 10:03am EDT by tgummels You should be able to reproduce with data I'll attach and the following command line: condor_submit -spool test.sdf This event sent from IssueTracker by tgummels issue 371984 I've tested this on RHEL 5.5 x86_64 with condor-7.4.4-0.14.el5 and I've got the same error in stderr output as in comment 1. 1. install condor with default configuration 2. set up dedicated scheduler: DedicatedScheduler = "DedicatedScheduler@--hostname--" START = Scheduler =?= $(DedicatedScheduler) SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = Scheduler =?= $(DedicatedScheduler) MPI_CONDOR_RSH_PATH = $(LIBEXEC) CONDOR_SSHD = /usr/sbin/sshd CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler 3. submit parallel job on one machine without transfer files(I've tried to simulate shared file system by not transfering files.): $cat mpi.sub: universe = parallel executable = /home/xxx/openmpi/ompiscript arguments = summpi log = logfile.$(NODE) output = outfile.$(NODE) error = errfile.$(NODE) machine_count = 1 getenv = true environment = LD_LIBRARY_PATH=/usr/lib64/openmpi/1.4-gcc/ queue $condor_submit -spool mpi.sub I've got ompiscript and other settings from https://bugzilla.redhat.com/show_bug.cgi?id=537232#c2 Am I doing anything wrong or is that bug still there? Created attachment 447967 [details]
error file
Martin, I'm not able to reproduce your failure case here, either with -spool or without. One thing to check is to make sure all of the paths in ompiscript are set properly (and that the scripts and Makefile aren't accidentally pulling down tools from an mpich installation). Some other things to consider: Is the spool directory getting created, or not? What attributes are set in the job ad? I don't know where was problem, but not it works. I've used clean Condor installation. Tested with condor-7.4.1-0.7 and it doesn't work. Tested with condor-7.4.4-0.14 on RHEL 5.5/4.8 x i386/x86_64 and it works.--> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Previously, /usr/libexec/condor/sshd.sh's use of the condor_chirp daemon assumed the previous creation of a spool temp directory, which was not the desired behavior. With this update, 'sshd.sh' no longer requires a spool directory.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |