Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 495685

Summary: HOSTALLOW_WRITE denied between HA Schedulers
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: gridAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Martin Kudlej <mkudlej>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1.1CC: lbrindle, mkudlej
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Grid bug fix C: When the schedd and startd were being configured on the same system C: Jobs to be run can match but will not ever start F: A change was made to the way the schedd and startd are configured R: Jobs will be matched and start as expected The schedd and startd configuration files were being configured incorrectly when installed on the same system. This was causing jobs to match, but refuse to start. A change was made to the way the daemons were configured, and jobs now run as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:19:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    

Description Matthew Farrellee 2009-04-14 12:18:16 UTC
# rpm -q condor-remote-configuration-server
condor-remote-configuration-server-1.0-14.el5

In the pool right now we have sched0 and sched1 as HA Schedulers and as Startds. However, HOSTALLOW_WRITE on one does not include the other. This means jobs to be run on one can match but cannot ever start...

From the StartLog on sched1:

4/14 08:09:05 PERMISSION DENIED to condor from host sched0 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason

Please include a simple example to reproduce this.

Comment 1 Matthew Farrellee 2009-04-14 12:23:36 UTC
Maybe there's a simple typo in the code:

sched1# condor_config_val -dump | grep HOSTALLOW_WRITE_STARTD
HOSTALLOW_WRITE_STARTD = startd, $(HOSTALLOW_WRITE), $(FLOCK_FROM)

"startd" in this context has no meaning

Comment 2 Matthew Farrellee 2009-04-14 13:23:30 UTC
A workaround is to add config to the condor_config.overrides file:

 On sched0:
   HOSTALLOW_WRITE_STARTD = sched1, $(HOSTALLOW_WRITE_STARTD)

 On sched1:
   HOSTALLOW_WRITE_STARTD = sched0, $(HOSTALLOW_WRITE_STARTD)

Comment 3 Robert Rati 2009-09-17 20:14:51 UTC
This was actually an issue with the schedd and startd being configured on the same system.  The schedulers will now be prompted for if a startd is configured on a system with a schedd on it.  As always, nodes part of an HA scheduler do not need to be entered as they will be automatically determined from the HA schedd name.

Fixed in:
condor-remote-configuration-1.0-17

Comment 4 Robert Rati 2009-09-17 20:15:25 UTC
All schedulers will need to be re-configured with this package for the changes to take affect.

Comment 6 Martin Kudlej 2009-10-23 12:21:35 UTC
I've tried it on condor-remote-configuration-server-1.0-14 on RHEL5.4 and
condor-remote-configuration-1.0-14 on RHEL4.8 (i386 x x86_64) and it doesn't work.
I've tried it on condor-remote-configuration(-server)-1.0-22 and it works -->VERIFIED

Comment 7 Irina Boverman 2009-10-29 14:29:46 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
please see bug summary.

Comment 8 Lana Brindley 2009-11-05 03:35:42 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-please see bug summary.+Grid bug fix
+
+C: When the schedd and startd were being configured on the same system
+C: Jobs to be run can match but will not ever start
+F: A change was made to the way the schedd and startd are configured
+R: Jobs will be matched and start as expected
+
+The schedd and startd configuration files were being configured incorrectly when installed on the same system. This was causing jobs to match, but refuse to start. A change was made to the way the daemons were configured, and jobs now run as expected.

Comment 10 errata-xmlrpc 2009-12-03 09:19:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html