Bug 608027
Summary: | cannot transfer files back to submit machine | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> | ||||
Component: | condor | Assignee: | Timothy St. Clair <tstclair> | ||||
Status: | CLOSED DEFERRED | QA Contact: | Martin Kudlej <mkudlej> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 1.3 | CC: | alyoung, matt, tstclair | ||||
Target Milestone: | 1.3.2 | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Windows | ||||||
Whiteboard: | |||||||
Fixed In Version: | condor-7.4.5-0.2 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-02-15 15:23:05 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Martin Kudlej
2010-06-25 12:47:23 UTC
Just fro notice I've submitted test job from BZ607938 without "+LeaveInjobQueue=False". Please enable D_SECURITY for the Schedd (SCHEDD_DEBUG) and for condor_transfer_data (via TOOL_DEBUG), and check the SchedLog for information as to why the connection is being DENIED. I've set SCHEDD_DEBUG = D_SECURITY on CM and TOOL_DEBUG = D_ALL on both machines. There are output from ls in cluster directory, SchedLog, output from condor_transfer_data, job description file and job executable file in attachment. Created attachment 427424 [details]
ls output, SchedLog, job description file, job executable, condor_transfer_data output
If I submit same job from Linux machine, I can transfer files back to submit machine. set SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE in your condor_config.local on windows Also, set aside a 3rd party location like C:\Temp not windows\temp for your files and verify the user submitting has write permissions to this location. There are several issues with capturing stdout & stderr on windows so you may want to have your bat file spew to a txt file and transfer ON_EXIT. Under these conditions all is well in my env. Please attempt to repo error with the settings listed above. When traversing your logs I also noticed that you submitted as "Administrator" which is not a default account on your Linux box. You will want to create an account on your windows machine with the same name/credz on your Linux schedd. When using CLAIMTOBE it will try to run as Classad[Owner] on your Linux box and if that account doesn't exist then it will FAIL to write to the spool directories due to permissions issues. One final note: I do see a failure to transfer job files if the run is still in progress (which I consider normal). Once the jobs have been marked "C" the transfer should succeed. I've always tried to transfer files back after jobs are completed. There are SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE SEC_CLIENT_AUTHENTICATION_METHODS = CLAIMTOBE on Windows and also on Linux CM+Scheduler machines. Timothy wrote: You will want to create an account on your windows machine with the same name/credz on your Linux schedd. When using CLAIMTOBE it will try to run as Classad[Owner] on your Linux box and if that account doesn't exist then it will FAIL to write to the spool directories due to permissions issues. Job starts and runs and is able to write to spool directory on Linux CM. There is ls output in attachment where you can see that there is not empty stdout for that job in spool directory on Linux. I run all tools as Administrator so there should not be any problem with rights on Windows side. I've tried also c:\temp with sharing and full rights for everyone and it doesn't work. If I've created user Administrator on Linux CM+Scheduler machine it works. But I think that if authentication is disabled and user Administrator can submit and run jobs on some machines in pool, then Administrator should be able to transfer results from CM+Scheduler to machine from which Administrator has submitted that job without account on CM. As I can see in log file it is just question of existence of user with the same name on CM as on remote submit machine. It doesn't depend on authentication: DC_AUTHENTICATE: received DC_AUTHENTICATE from <:3273> DC_AUTHENTICATE: added incoming session id :23402:1277886744:4571 to cache for 80 seconds (lease is 3620s, return address is unknown). DC_AUTHENTICATE: Success. PERMISSION GRANTED to unauthenticated user from host for command 489 (TRANSFER_DATA_WITH_PERMS), access level WRITE: reason: WRITE authorization policy allows access by anyone HANDSHAKE: in handshake(my_methods = 'Claimtobe') HANDSHAKE: handshake() - i am the server HANDSHAKE: client sent (methods == 2) HANDSHAKE: i picked (method == 2) HANDSHAKE: client received (method == 2) Authentication was a Success. ZKM: setting default map to Administrator ZKM: post-map: current user is 'Administrator' ZKM: post-map: current domain is '(null)' ZKM: post-map: current FQU is 'Administrator' PERMISSION GRANTED to Administrator from host for queue management, access level WRITE: reason: WRITE authorization policy allows access by anyone PERMISSION GRANTED to Administrator from host for queue management, access level WRITE: reason: WRITE authorization policy allows access by anyone The submitting job ad as the FileTransferObject sees it ... ReliSock: put_file: Failed to open file /var/lib/condor/spool/cluster330.proc0.subproc0/_condor_stderr, errno = 13. DoUpload: (Condor error code 13, subcode 13) SCHEDD at 10.34.33.58 failed to send file(s) to <:3273>: error reading from /var/lib/condor/spool/cluster330.proc0.subproc0/_condor_stderr: (errno 13) Permission denied; TOOL failed to receive file(s) from <1:44461> generalJobFilesWorkerThread(): failed to transfer files for job 330.0 ERROR - Staging of job files failed! How are you submitting your jobs? condor_submit -name mrg27.lab.bos.redhat.com -spool when jobs are completed. condor_tranfer_data.exe -name <schedd name> <clusterid> My data transfers are fine. Upstream tracking ticket: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1478 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Fixed upstream in stable series. Deleted Technical Notes Contents. Old Contents: Fixed upstream in stable series. |