Bug 750314
Summary: | fenced/dlm_controld: fix handling of startup partition merge | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Teigland <teigland> | ||||||
Component: | cluster | Assignee: | David Teigland <teigland> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.2 | CC: | ccaulfie, cluster-maint, djansa, jpayne, lhh, rpeterso, rsteiger, teigland | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | cluster-3.0.12.1-27.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause: a cluster partition and merge during startup fencing was not detected correctly.
Consequence: dlm lockspace operations are stuck.
Fix: detect and handle this event correctly.
Result: dlm lockspace operations are not stuck.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-06-20 13:58:27 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 756082 | ||||||||
Attachments: |
|
Description
David Teigland
2011-10-31 16:46:03 UTC
Created attachment 531008 [details]
fenced patch
see comment in patch
Created attachment 531009 [details]
dlm_controld patch
see comment in patch
The two patches make the sequence above work correctly. Also verified that the sequence works as expected when the fence_ack_manual is done before the partition (i.e. one node needs to be reset). Also verified that two other historically difficult partition+merge tests still work as expected: test 1 ------ - nodes 1,2,3,4 - all: no fencing configured - all: service cman start - all: dlm_tool join foo - use iptables to create network partition 1 | 2,3,4 - wait for partition to be detected - remove network partition resulting in merge 1,2,3,4 - 2,3,4: should kill corosync on node 1 automatically - 1: reboot - 1: service cman start - 1: dlm_tool join foo test 2 ------ - nodes 1,2,3,4 - all: no fencing configured - all: service cman start - all: dlm_tool join foo - use iptables to create network partition 1,2 | 3,4 - wait for partition to be detected - remove network partition resulting in merge 1,2,3,4 - 1,2: reboot (or 3,4) - 1,2: service cman start - 1,2: dlm_tool join foo pushed to RHEL6 branch Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: a cluster partition and merge during startup fencing was not detected correctly. Consequence: dlm lockspace operations are stuck. Fix: detect and handle this event correctly. Result: dlm lockspace operations are not stuck. Verified in cluster-3.0.12.1-32.el6.x86_64 Cluster.conf: <?xml version="1.0"?> <cluster name="dash" config_version="1"> <dlm log_debug="1"/> <clusternodes> <clusternode name="dash-01" nodeid="1"/> <clusternode name="dash-02" nodeid="2"/> <clusternode name="dash-03" nodeid="3"/> </clusternodes> </cluster> Steps in Description Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0861.html |