Bug 1399757
Summary: | Ganesha services are not stopped when pacemaker quorum is lost | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Arthy Loganathan <aloganat> | |
Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> | |
Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1400237 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:52:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351528, 1400237, 1400572, 1400573 |
Description
Arthy Loganathan
2016-11-29 16:37:53 UTC
Few more observations: Initially when the quorunm is lost, pcs status shows, [root@dhcp46-111 ~]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp46-111.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition WITHOUT quorum Last updated: Wed Nov 30 16:09:13 2016 Last change: Wed Nov 30 14:46:54 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com 4 nodes and 24 resources configured Online: [ dhcp46-111.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-111.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped But sometimes after ~ 2 hours, some of the node's services are going to stopped state. Online: [ dhcp46-42.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-101.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp46-42.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp46-42.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-42.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com ] Resource Group: dhcp46-42.lab.eng.blr.redhat.com-group dhcp46-42.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped dhcp46-42.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp46-42.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: dhcp46-101.lab.eng.blr.redhat.com-group dhcp46-101.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-42.lab.eng.blr.redhat.com dhcp46-101.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-42.lab.eng.blr.redhat.com dhcp46-101.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-42.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp47-155.lab.eng.blr.redhat.com-group dhcp47-155.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped dhcp47-155.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: dhcp47-167.lab.eng.blr.redhat.com-group dhcp47-167.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped dhcp47-167.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp47-167.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Also, IOs are continuing on the mount point even when quorum is lost. upstream mainline patch http://review.gluster.org/#/c/15981/ posted for review. upstream mainline : http://review.gluster.org/#/c/15981/ upstream 3.9 : http://review.gluster.org/15991 upstream 3.8 : http://review.gluster.org/15992 downstream : https://code.engineering.redhat.com/gerrit/#/c/91896/ I have seen this issue few times very rarely after the fix, but with the latest build the issue is not seen. Verified the fix in build, nfs-ganesha-gluster-2.4.1-4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-11.el7rhgs.x86_64 nfs-ganesha-2.4.1-4.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |