Bug 499826
Summary: | master termination not stopping HA daemon acquisition | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
Component: | condor | Assignee: | Matthew Farrellee <matt> |
Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 1.1.1 | CC: | iboverma, lans.carstensen, lbrindle, mkudlej, pmackinn, tao, tross |
Target Milestone: | 1.2 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Grid bug fix
C: The master is in charge of starting HA daemons. It uses a shared lock file between multiple masters to determine which can run the HA daemon. When a master receives a signal to terminate it does not stop the process of trying to acquire an HA daemon lock and start the HA daemon.
C: a master may receive a termination signal (TERM, QUIT), exit most daemons below it, but then
acquire a HA lock and start an HA daemon. This prevents the master from successfully exiting, and clock reception of the termination signal for
subsequent shutdown attempts.
F: Corrected problem with master termination not stopping HA daemon acquisition
R: The code in condor_master now prevents daemons waiting on locks from
acquiring them while the condor_master is trying to shut down, for example from a TERM or
QUIT signal, or from condor_off
A master was able to receive a termination signal and exit most of the daemons below it, but then acquire a High Availability (HA) lock and start an HA daemon. This prevents the master from successfully exiting. The master now terminates the HA daemon acquisition successfully, and prevents daemons waiting on locks from
acquiring them while the condor_master is trying to shut down.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2009-12-03 09:19:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 527551 |
Description
Matthew Farrellee
2009-05-08 13:16:55 UTC
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Corrected problem with master termination not stopping HA daemon acquisition (499826) I've tested it on RHEL 5.4/4.8 i386/x86_64 with condor-7.2.2-0.9 and it doesn't work as it should. I'm waiting for fix BZ528544. You can workaround BZ528544 by setting QMF_DELETE_ON_SHUTDOWN=FALSE in your config. I've tested it on RHEL 5.4/4.8 i386/x86_64 with condor-7.4.1-0.2 with workaround from comment number 5 and it works as it excepted. -->VERIFIED So has QMF_DELETE_ON_SHUTDOWN been set to FALSE as default? If not, what was the fix? LKB Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,12 @@ +Grid bug fix + +C: The master is in charge of starting HA daemons. It uses a shared lock file between multiple masters to determine which can run the HA daemon. When a master receives a signal to terminate it does not stop the process of trying to acquire an HA daemon lock and start the HA daemon. +C: a master may receive a termination signal (TERM, QUIT), exit most daemons below it, but then +acquire a HA lock and start an HA daemon. This prevents the master from successfully exiting, and clock reception of the termination signal for +subsequent shutdown attempts. +F: +R: + +NEED FURTHER INFO FOR RELNOTE + Corrected problem with master termination not stopping HA daemon acquisition (499826) (In reply to comment #7) > So has QMF_DELETE_ON_SHUTDOWN been set to FALSE as default? If not, what was > the fix? > > LKB This is not related to that bug in any way. The code in the condor_master now prevents daemons waiting on locks from acquiring them when the condor_master is trying to shutdown, say by a TERM or QUIT signal or by condor_off. Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -9,4 +9,9 @@ NEED FURTHER INFO FOR RELNOTE -Corrected problem with master termination not stopping HA daemon acquisition (499826)+Corrected problem with master termination not stopping HA daemon acquisition (499826) + +RELEASE NOTE: +"The code in condor_master now prevents daemons waiting on locks from +acquiring them while the condor_master is trying to shut down, foe example by a TERM or +QUIT signal, or by condor_off." Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -13,5 +13,5 @@ RELEASE NOTE: "The code in condor_master now prevents daemons waiting on locks from -acquiring them while the condor_master is trying to shut down, foe example by a TERM or +acquiring them while the condor_master is trying to shut down, for example from a TERM or -QUIT signal, or by condor_off."+QUIT signal, or from condor_off." Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -4,14 +4,10 @@ C: a master may receive a termination signal (TERM, QUIT), exit most daemons below it, but then acquire a HA lock and start an HA daemon. This prevents the master from successfully exiting, and clock reception of the termination signal for subsequent shutdown attempts. -F: -R: - -NEED FURTHER INFO FOR RELNOTE - -Corrected problem with master termination not stopping HA daemon acquisition (499826) - -RELEASE NOTE: -"The code in condor_master now prevents daemons waiting on locks from +F: Corrected problem with master termination not stopping HA daemon acquisition +R: The code in condor_master now prevents daemons waiting on locks from acquiring them while the condor_master is trying to shut down, for example from a TERM or -QUIT signal, or from condor_off."+QUIT signal, or from condor_off + +A master was able to receive a termination signal and exit most of the daemons below it, but then acquire a High Availability (HA) lock and start an HA daemon. This prevents the master from successfully exiting. The master now terminates the HA daemon acquisition successfully, and prevents daemons waiting on locks from +acquiring them while the condor_master is trying to shut down. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html |