Bug 486487
Summary: | Stale .schedd_address and .schedd_classad | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
Component: | condor | Assignee: | Matthew Farrellee <matt> |
Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 1.1 | CC: | lans.carstensen, lbrindle, mkudlej |
Target Milestone: | 1.2 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Grid bug fix
C: Stale .schedd_address and .schedd_classad files were being left on a host when the schedd failed in a high availability cluster
C: When condor_q was run, it would fail to connect to the schedd, because it was checking the stale files first.
F: The log files are now stored in SPOOL instead of LOG
R: Multiple machines in a pool can now read the files, and stale files no longer cause a problem.
Stale .schedd_address and .schedd_classad files were being left on a host when the schedd failed in a high availability cluster. This caused condor_q to fail to connect to the schedd. The log files were moved from LOG to SPOOL, which allows multiple machines in a pool to read the files, and stale files no longer cause a problem.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-12-03 09:19:27 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 527551 |
Description
Matthew Farrellee
2009-02-19 23:21:05 UTC
Resolved for 7.3.1-0.4 commit c33afd1e6de7c57ef8d5252643d9f860b23890f8 Author: Matthew Farrellee <matt> Date: Mon Apr 27 15:12:07 2009 -0500 As part of moving SCHEDD_ADDRESS_FILE and SCHEDD_DAEMON_AD_FILE, update VALID_SPOOL_FILES so PREEN doesn't wipe them out commit 84afeb8fc5837d79aa1b513b8bde9f77a233b192 Author: Matthew Farrellee <matt> Date: Mon Apr 27 10:47:45 2009 -0500 Moved SCHEDD_ADDRESS_FILE and SCHEDD_DAEMON_AD_FILE from LOG to SPOOL These two files are dropped by the schedd and are used by local tools, and Quill, to locate the Schedd without contacting the Collector. Right now they default to - SCHEDD_ADDRESS_FILE = $(LOG)/.schedd_address SCHEDD_DAEMON_AD_FILE = $(LOG)/.schedd_classad This is all well and good, except if you are in an HA setup. When you have fail-over of the schedd you'll get stale files on the failed schedd machine. From that point forward tools, e.g. condor_q/submit, on the failed schedd machine will not be able to contact the schedd. The tools consult the files and do not fall back to a collector lookup. Solutions? 1) make the tools fall back to a collector lookup, 2) don't use the files at all if you are in an HA schedd setup, 3) put the files in SPOOL instead of LOG (1) is work with little payoff at the moment (2) works, but requires separate configuration when in HA mode (3) avoids the work of (1), allows for a consistent config over (2), and may even benefit from letting multiple machines in a pool avoid the collector lookup Downsides of (3)? Well, the file has been in $(LOG) for a long time, along with all other ADDRESS_FILEs, but no one should be relying on that! *** Bug 497854 has been marked as a duplicate of this bug. *** I've tried it on condor-7.2.2-9 on RHEL 5.4/4.8 and i386/x86_64 and it didn't work. I've tried it on condor-7.4.1-0.1 and it works --> VERIFIED I've used testing scenario(condor_q on node where condor_schedd crashed) described in Description. Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: please see bug summary. Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,9 @@ -please see bug summary.+Grid bug fix + +C: Stale .schedd_address and .schedd_classad files were being left on a host when the schedd failed in a high availability cluster +C: When condor_q was run, it would fail to connect to the schedd, because it was checking the stale files first. +F: The log files are now stored in SPOOL instead of LOG +R: Multiple machines in a pool can now read the files, and stale files no longer cause a problem. + + +Stale .schedd_address and .schedd_classad files were being left on a host when the schedd failed in a high availability cluster. This caused condor_q to fail to connect to the schedd. The log files were moved from LOG to SPOOL, which allows multiple machines in a pool to read the files, and stale files no longer cause a problem. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html |