Bug 474725

Summary: Security session cache and account priv-switching
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: gridAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Jeff Needle <jneedle>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.0CC: dan
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 16:04:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2008-12-05 00:22:15 UTC
The security session cache code does not take into account priv-switching. A daemon that changes between different PRIV_USERs, such as the JobRouter, can get things confused when reusing a security session negotiated for one user when acting as another.

The cache needs to include euid in its key.

Comment 1 Dan Bradley 2008-12-08 22:49:14 UTC
Here is an example configuration snippet that I used to produce the problem with security sessions and JobRouter:


################## BEGIN #######################

SEC_DEFAULT_AUTHENTICATION = REQUIRED

# These settings become the default settings for all routes
JOB_ROUTER_DEFAULTS = \
 [ \
   requirements=target.WantJobRouter is True; \
   MaxIdleJobs = 10; \
   MaxJobs = 200; \
\
   delete_WantJobRouter = true; \
   set_requirements = true; \
   TargetUniverse = 12; \
 ]

# Now we define each of the routes to send jobs on
JOB_ROUTER_ENTRIES = \
  [ \
    name = "Site 1"; \
  ]


# Reminder: you must restart Condor for changes to DAEMON_LIST to take effect.
DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER

# For testing, set this to a small value to speed things up.
JOB_ROUTER_POLLING_PERIOD = 10


########################## END ############################


I ran a personal condor (master, collector, negotiator, schedd, job_router) as root.  I submitted jobs to it from two different accounts.  The submit file was just this:

############ BEGIN ##############
universe = vanilla
requirements = false
+WantJobRouter = true
notification = never
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = /usr/bin/env
output = stdout
error = stderr
queue
############ END ##############

If you wait for the first user's job to be routed, then the second user's job will fail to be routed and will therefore hang around in the queue forever.  The problem is visible in the SchedLog:

12/8 15:18:24 OwnerCheck(user1) failed in SetAttribute for job 8.0

In this case, it is failing when trying to mark the submitted job as being managed by the job router.  It is failing because it is using a security session that is mapped to user1 when trying to operate on a job owned by user2.

The problem does not happen unless read-access requires authentication.  The reason is that the QMGMT command is registered as a read-level command.  So an authenticated security session is only created if read-level access requires authentication.

Comment 2 Matthew Farrellee 2008-12-09 23:36:11 UTC
commit ff4f517a3e134ef0168f45f604e1e3b8a3d1e6be
Author: Dan Bradley <dan@>
Date:   Tue Dec 9 15:58:11 2008 -0600

    Changed FS authentication to authenticate as condor when possible.
    This is now consistent with other authentication methods.
    Also made the queue super user(s) able to set the owner attribute
    to any value from the list of users who have ever owned jobs in
    the current instance of the schedd.  This, in combination with
    the FS change, allows JobRouter to submit jobs as condor for
    all queue management operations rather than as individual
    users, thus avoiding the issue of correct session cache
    use when a daemon uses multiple identities to talk to the same
    service.

This will be part of 7.2.0-0.10

Comment 5 errata-xmlrpc 2009-02-04 16:04:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html