Bug 806002
Summary: | Failed cluster rejoin after reboot might lead to later rejoin without being in fence domain | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> |
Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 6.3 | CC: | ccaulfie, cluster-maint, jpayne, lhh, rpeterso, teigland |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | cluster-3.0.12.1-30.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: cman init script did not roll back changes in case of errors during startup.
Consequence: some daemons could be erroneously left running on a node.
Fix: cman init script now performs a full roll back when errors are encountered.
Result: no daemons are left running in case of errors.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 13:58:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jaroslav Kortus
2012-03-22 16:19:22 UTC
(In reply to comment #0) good catch. > Expected results: > one of: > 1. rollback of all previously taken actions (stopping qdiskd, killing cman, > basically calling the stop function) if the quorum is not regained during the > specified timeout > 2. start all daemons as if there was a quorum and let them to handle it > properly. > > I guess variant 1 should be preferred as it more or less copies the existing > state without described negative consequences. Yes we should go for 1). #2 is doable, but it involves some major surgery in the init script and add a major change in behavior. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: cman init script did not roll back changes in case of errors during startup. Consequence: some daemons could be erroneously left running on a node. Fix: cman init script now performs a full roll back when errors are encountered. Result: no daemons are left running in case of errors. Verified in cman-3.0.12.1-31.el6: [root@dash-03 ~]# rpm -q cman cman-3.0.12.1-24.el6.x86_64 [root@dash-03 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] [root@dash-03 ~]# /etc/init.d/cman status corosync is stopped [root@dash-03 ~]# ls new/ clusterlib-3.0.12.1-31.el6.x86_64.rpm cman-3.0.12.1-31.el6.x86_64.rpm [root@dash-03 ~]# yum localupdate new/c* [root@dash-03 ~]# rpm -q cman cman-3.0.12.1-31.el6.x86_64 [root@dash-03 ~]# /etc/init.d/cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... corosync died: Could not read cluster configuration Check cluster logs for details [FAILED] Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0861.html |