Bug 1399757
| Summary: | Ganesha services are not stopped when pacemaker quorum is lost | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Arthy Loganathan <aloganat> | |
| Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> | |
| Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.2 | CC: | amukherj, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal | |
| Target Milestone: | --- | |||
| Target Release: | RHGS 3.2.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8.4-7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1400237 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-23 05:52:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1351528, 1400237, 1400572, 1400573 | |||
|
Description
Arthy Loganathan
2016-11-29 16:37:53 UTC
Few more observations:
Initially when the quorunm is lost, pcs status shows,
[root@dhcp46-111 ~]# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp46-111.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition WITHOUT quorum
Last updated: Wed Nov 30 16:09:13 2016 Last change: Wed Nov 30 14:46:54 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com
4 nodes and 24 resources configured
Online: [ dhcp46-111.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp46-111.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group
dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked)
Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group
dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked)
Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group
dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-111.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-111.lab.eng.blr.redhat.com (blocked)
Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group
dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped
But sometimes after ~ 2 hours, some of the node's services are going to stopped state.
Online: [ dhcp46-42.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp46-101.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp46-42.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp46-42.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp46-42.lab.eng.blr.redhat.com dhcp47-167.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-101.lab.eng.blr.redhat.com dhcp47-155.lab.eng.blr.redhat.com ]
Resource Group: dhcp46-42.lab.eng.blr.redhat.com-group
dhcp46-42.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped
dhcp46-42.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp46-42.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped
Resource Group: dhcp46-101.lab.eng.blr.redhat.com-group
dhcp46-101.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-42.lab.eng.blr.redhat.com
dhcp46-101.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-42.lab.eng.blr.redhat.com
dhcp46-101.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-42.lab.eng.blr.redhat.com (blocked)
Resource Group: dhcp47-155.lab.eng.blr.redhat.com-group
dhcp47-155.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped
dhcp47-155.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp47-155.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped
Resource Group: dhcp47-167.lab.eng.blr.redhat.com-group
dhcp47-167.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped
dhcp47-167.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp47-167.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped
Also, IOs are continuing on the mount point even when quorum is lost.
upstream mainline patch http://review.gluster.org/#/c/15981/ posted for review. upstream mainline : http://review.gluster.org/#/c/15981/ upstream 3.9 : http://review.gluster.org/15991 upstream 3.8 : http://review.gluster.org/15992 downstream : https://code.engineering.redhat.com/gerrit/#/c/91896/ I have seen this issue few times very rarely after the fix, but with the latest build the issue is not seen. Verified the fix in build, nfs-ganesha-gluster-2.4.1-4.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-11.el7rhgs.x86_64 nfs-ganesha-2.4.1-4.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |