Bug 496227
Summary: | Remote Config: HA Schedd lock period too long | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
Component: | grid | Assignee: | Robert Rati <rrati> |
Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.1.1 | CC: | iboverma, lans.carstensen, lbrindle, mkudlej, tao |
Target Milestone: | 1.2 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Grid bug fix
C: HA_LOCK_HOLD_TIME and HA_POLL_PERIOD do not have sensible default periods.
C: A number of failover problems, such as two schedd's running simultaneously, or inappropriate lock acquisition.
F: HA Schedd lock period has been shortened. HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required)
R: Failover works more reliably
HA_LOCK_HOLD_TIME and HA_POLL_PERIOD had default values that could cause a range of problems with failover. HA Schedd lock period has been shortened. HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required), and failover now works more reliably.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-12-03 09:16:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 527551 |
Description
Matthew Farrellee
2009-04-17 12:40:09 UTC
Fixed in: condor-remote-configuration-1.0-17 I've tried it on condor-remote-configuration-server-1.0-14 on RHEL5.4 and condor-remote-configuration-1.0-14 on RHEL4.8 (i386 x x86_64) and there weren't any variables named HA_LOCK_HOLD_TIME or HA_POLL_PERIOD. I've tried it on condor-remote-configuration(-server)-1.0-22 and there are HA_LOCK_HOLD_TIME = 300 and HA_POLL_PERIOD = 60. Is this enough to verify the bug or could you describe here any testing scenario please? Initiate a failover (either by shutting down the Schedd, or killing it) and a new Schedd should start within 6 minutes Testing scenario: 1st machine RHEL 5.4 i386/x86_64, 2nd machine RHEL 4.8 x86_64/i386. HA SCHEDD configured via condor_configure_node from RHEL5.4. Shut down condor on 1st machine and within less than 6 minutes schedd starts on 2nd machine, so it works as we expected (condor-7.4.1-0.2). Tested it with condor-7.2.2-0.9 and it doesn't work. Tested it with condor-7.4.1-0.2 and it works. -->VERIFIED Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: HA Schedd lock period has been shorten: HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required) (496227) Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,8 @@ -HA Schedd lock period has been shorten: HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required) (496227)+Grid bug fix + +C: HA_LOCK_HOLD_TIME and HA_POLL_PERIOD do not have sensible default periods. +C: A number of failover problems, such as two schedd's running simultaneously, or inappropriate lock acquisition. +F: HA Schedd lock period has been shortened. HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required) +R: Failover works more reliably + +HA_LOCK_HOLD_TIME and HA_POLL_PERIOD had default values that could cause a range of problems with failover. HA Schedd lock period has been shortened. HA_LOCK_HOLD_TIME now defaults to 300 seconds, and HA_POLL_PERIOD to 60 seconds (these parameters could be changed to lower values if faster fail-over is required), and failover now works more reliably. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html |