Bug 458889

Summary: Job Hooks leave starter in wrong privstate
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: gridAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Kim van der Riet <kim.vdriet>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: matt
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 16:06:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Rati 2008-08-12 22:16:56 UTC
Description of problem:
It looks like after the job hooks are run, the starter somehow ended up in 
a final condor uid state and can't access files correctly created by a job.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
>>> 8/8 09:03:57 in VanillaProc::StartJob()
>>> 8/8 09:03:57 in OsProc::StartJob()
>>> 8/8 09:03:57 IWD: /autohome/u100/rherban/condortest
>>> 8/8 09:03:57 passwd_cache: setgroups( rherban ) failed.
>>> 8/8 09:03:57 set_user_egid – ERROR: initgroups(rherban, 6751) failed,
>>> errno: Operation not permitted
>>> 8/8 09:03:57 Input file: /autohome/u100/rherban/condortest/in.blast
>>> 8/8 09:03:57 Failed to open
>>> ‘/autohome/u100/rherban/condortest/out.blast’ as standard output:
>>> Permission denied (errno 13)
>>> 8/8 09:03:57 Failed to open
>>> ‘/autohome/u100/rherban/condortest/err.blast’ as standard error:
>>> Permission denied (errno 13)
>>> 8/8 09:03:57 Failed to open some/all of the std files…
>>> 8/8 09:03:57 Aborting OsProc::StartJob.
>>> 8/8 09:03:57 Failed to start job, exiting

Expected results:
The job should execute under the correct permissions, and files be accessible after completion.

Additional info:

Comment 1 Robert Rati 2008-08-12 22:27:49 UTC
The problem is that before condor was forking to create a process to run the hooks, it is doing checks on the executable/command as the priv mode specified. For the condor_final priv, it was changing to ruid condor instead of changing to euid condor for the checks so the starter ended up permanently as the condor user and thus wasn’t able to access files not world readable.

A job like the one below should produce errors in StarterLog about being unable to access stdout and stderr:

Cmd = “/bin/date”
Out = “/home/rsquared/date.output”
Err = “/home/rsquared/date.err”
Iwd = “/home/rsquared”
Owner = “rsquared”

Comment 2 Matthew Farrellee 2008-08-12 22:34:38 UTC
Bug supposedly present in condor 7.0.4-4

Comment 5 errata-xmlrpc 2009-02-04 16:06:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html