Bug 1329808

Summary: pcs status shows nodes in stopped state on a rebooted node until iptables are flushed on RHEL 6 platform.
Product: Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: jthottan, kkeithle, msaini, mzywusko, ndevos, nlevinki, sanandpa, skoduri
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-03 12:06:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shashank Raj 2016-04-23 11:44:47 UTC
Description of problem:

pcs status shows nodes in stopped state on a rebooted node until iptables are flushed.

Version-Release number of selected component (if applicable):

[root@dhcp42-121 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.3.1-4.el6rhs.x86_64
glusterfs-ganesha-3.7.9-2.el6rhs.x86_64
nfs-ganesha-2.3.1-4.el6rhs.x86_64

[root@dhcp42-121 ~]# rpm -qa|grep pcs
pcsc-lite-libs-1.5.2-15.el6.x86_64
pcs-0.9.139-9.el6.x86_64

How reproducible:

Always

Steps to Reproduce:

1.Create a 4 node cluster and configure nfs-ganesha on the cluster.
2.Check pcs status on all the nodes and make sure it shows proper status for the nodes.
3.Reboot any one node in the cluster.
4.When the node comes back. start pcsd, pacemaker and nfs-ganesha service.
5.Observe that pcs status on the rebooted node shows nodes in stopped state:

[root@dhcp42-121 ~]# pcs status
Cluster name: G1461325499.42
Last updated: Sat Apr 23 22:37:09 2016
Last change: Fri Apr 22 22:45:44 2016
Stack: cman
Current DC: dhcp42-121.lab.eng.blr.redhat.com - partition WITHOUT quorum
Version: 1.1.11-97629de
4 Nodes configured
16 Resources configured


Online: [ dhcp42-121.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Stopped: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Stopped: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Stopped: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 dhcp42-121.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Stopped 
 dhcp42-128.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Stopped 
 dhcp42-119.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Stopped 
 dhcp43-59.lab.eng.blr.redhat.com-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped 

6.On the other nodes, the status shows as failed over for the rebooted node.

[root@dhcp42-128 ~]# pcs status
Cluster name: G1461325499.42
Last updated: Sat Apr 23 22:31:36 2016
Last change: Fri Apr 22 22:45:44 2016
Stack: cman
Current DC: dhcp42-128.lab.eng.blr.redhat.com - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
16 Resources configured


Online: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp42-121.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp42-121.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp42-121.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp42-121.lab.eng.blr.redhat.com ]
 dhcp42-121.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp43-59.lab.eng.blr.redhat.com
 dhcp42-128.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-128.lab.eng.blr.redhat.com
 dhcp42-119.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-119.lab.eng.blr.redhat.com
 dhcp43-59.lab.eng.blr.redhat.com-cluster_ip-1  (ocf::heartbeat:IPaddr):        Started dhcp43-59.lab.eng.blr.redhat.com


7. Flush the iptables on the rebooted node and observe that the pcs status now shows proper state for all the nodes:

[root@dhcp42-121 ~]# iptables -F
[root@dhcp42-121 ~]# pcs status
Cluster name: G1461325499.42
Last updated: Sat Apr 23 22:41:07 2016
Last change: Fri Apr 22 22:45:44 2016
Stack: cman
Current DC: dhcp42-128.lab.eng.blr.redhat.com - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
16 Resources configured


Online: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 dhcp42-121.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-121.lab.eng.blr.redhat.com 
 dhcp42-128.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-128.lab.eng.blr.redhat.com 
 dhcp42-119.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-119.lab.eng.blr.redhat.com 
 dhcp43-59.lab.eng.blr.redhat.com-cluster_ip-1  (ocf::heartbeat:IPaddr):        Started dhcp43-59.lab.eng.blr.redhat.com 



[root@dhcp42-128 ~]# pcs status
Cluster name: G1461325499.42
Last updated: Sat Apr 23 22:42:34 2016
Last change: Fri Apr 22 22:45:44 2016
Stack: cman
Current DC: dhcp42-128.lab.eng.blr.redhat.com - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
16 Resources configured


Online: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-121.lab.eng.blr.redhat.com dhcp42-128.lab.eng.blr.redhat.com dhcp43-59.lab.eng.blr.redhat.com ]
 dhcp42-121.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-121.lab.eng.blr.redhat.com 
 dhcp42-128.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-128.lab.eng.blr.redhat.com 
 dhcp42-119.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp42-119.lab.eng.blr.redhat.com 
 dhcp43-59.lab.eng.blr.redhat.com-cluster_ip-1  (ocf::heartbeat:IPaddr):        Started dhcp43-59.lab.eng.blr.redhat.com 


Actual results:

pcs status shows nodes in stopped state on a rebooted node until iptables are flushed on RHEL 6 platform

Expected results:

pcs status should show proper output and iptables should not restrict it.

Additional info:

Comment 2 Shashank Raj 2016-09-29 13:43:36 UTC
Kaleb,

Can we confirm this behavior?

Comment 3 Soumya Koduri 2016-10-14 10:25:27 UTC
Could you please re-test with the latest builds available and check if the issue still exists.

Comment 4 Shashank Raj 2016-10-19 11:34:50 UTC
This issue is very much reproducible with latest 3.2 builds and the steps to reproduce are same as mentioned in description of the bug.

Comment 6 Kaleb KEITHLEY 2016-11-08 17:02:50 UTC
I believe we need to use the (new) portblock RA that's in resource-agents-2.9.5.

This is in the upstream master and 3.9 branches. Would need to backport to 3.8 and rhgs-3.2