Description of problem: After ganesha node reboot, portblock process goes to FAILED state. In a four node cluster, if one of the node gets rebooted/shutdown, portblock process of any of the nodes(not particular node) are in FAILED state. Even if the shutdown/rebooted node is brought up, failback is not happening if the portblock process is in FAILED state. Version-Release number of selected component (if applicable): nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: Consistent Steps to Reproduce: 1. Create 4 node ganesha cluster. 2. Reboot one of the node 3. Check pcs status Actual results: portblock process goes to FAILED state in pcs status. Expected results: All the process should be up and running. Additional info: [root@dhcp46-139 ~]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp46-124.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com 4 nodes and 24 resources configured Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com Failed Actions: * dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms * dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@dhcp46-139 ~]# ganesha log snippet: --------------------- Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records in ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records out ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0 bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records in ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records out ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0 bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]
I have tried rebooting the nodes(different nodes each time) in which shared_storage bricks are present. Have not seen the issue(5/5 times) in 10.70.46.42 cluster setup.
Thanks Oyvind and Arthy. Posted fix upstream to increase timeout of unblock RA to 60s during creation. http://review.gluster.org/15947
upstream mainline : http://review.gluster.org/15947 upstream 3.9 : http://review.gluster.org/15994 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/91871/
Portblock resource agent comes back to Started State after node reboots/shutdown. Verified the fix in build, glusterfs-ganesha-3.8.4-7.el7rhgs.x86_64 nfs-ganesha-2.4.1-2.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html