PCS cluster IP resources may enter FAILED state during failover/failback of VIP in NFS-Ganesha HA cluster. As a result, VIP is inaccessible resulting in mount failures or system freeze.
Workaround: Clean up the resource that failed by using the following command:
pcs resource cleanup <resource-id>
DescriptionArthy Loganathan
2017-07-12 09:52:14 UTC
Description of problem:
Ganesha cluster resources goes to failed state during failback and when IOs are running on.
Version-Release number of selected component (if applicable):
nfs-ganesha-gluster-2.4.1-11.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-18.5.el7rhgs.x86_64
nfs-ganesha-2.4.1-11.el7rhgs.x86_64
Rhel version:
RHEL 7.4 RC1.2 build
How reproducible:
Seen once
Steps to Reproduce:
1. Create four node ganesha cluster. create a volume and export it.
2. Kill ganesha on two nodes.
3. When IOs are running, do failback.
Actual results:
Ganesha cluster resources goes to failed state during failback.
Expected results:
Failback should succeed. All cluster resources should be in Started state.
Additional info:
[root@dhcp46-115 ~]# pcs status
Cluster name: G1499810001.0
Stack: corosync
Current DC: dhcp46-131.lab.eng.blr.redhat.com (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Jul 12 12:28:42 2017
Last change: Wed Jul 12 04:02:18 2017 by root via crm_attribute on dhcp46-124.lab.eng.blr.redhat.com
4 nodes configured
24 resources configured
Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group
dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): FAILED dhcp46-115.lab.eng.blr.redhat.com (blocked)
dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped
Resource Group: dhcp46-131.lab.eng.blr.redhat.com-group
dhcp46-131.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-131.lab.eng.blr.redhat.com
dhcp46-131.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-131.lab.eng.blr.redhat.com
dhcp46-131.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-131.lab.eng.blr.redhat.com
Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group
dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com
Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group
dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
Similar bug exists already but the issue was hit in different scenario.
https://bugzilla.redhat.com/show_bug.cgi?id=1399753
Description of problem: Ganesha cluster resources goes to failed state during failback and when IOs are running on. Version-Release number of selected component (if applicable): nfs-ganesha-gluster-2.4.1-11.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-18.5.el7rhgs.x86_64 nfs-ganesha-2.4.1-11.el7rhgs.x86_64 Rhel version: RHEL 7.4 RC1.2 build How reproducible: Seen once Steps to Reproduce: 1. Create four node ganesha cluster. create a volume and export it. 2. Kill ganesha on two nodes. 3. When IOs are running, do failback. Actual results: Ganesha cluster resources goes to failed state during failback. Expected results: Failback should succeed. All cluster resources should be in Started state. Additional info: [root@dhcp46-115 ~]# pcs status Cluster name: G1499810001.0 Stack: corosync Current DC: dhcp46-131.lab.eng.blr.redhat.com (version 1.1.16-12.el7-94ff4df) - partition with quorum Last updated: Wed Jul 12 12:28:42 2017 Last change: Wed Jul 12 04:02:18 2017 by root via crm_attribute on dhcp46-124.lab.eng.blr.redhat.com 4 nodes configured 24 resources configured Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): FAILED dhcp46-115.lab.eng.blr.redhat.com (blocked) dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: dhcp46-131.lab.eng.blr.redhat.com-group dhcp46-131.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-131.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-131.lab.eng.blr.redhat.com dhcp46-131.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-131.lab.eng.blr.redhat.com Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com Similar bug exists already but the issue was hit in different scenario. https://bugzilla.redhat.com/show_bug.cgi?id=1399753