Description of problem: After ganesha node reboot, portblock process goes to FAILED state. In a four node cluster, if one of the node gets rebooted/shutdown, portblock process of any of the nodes(not particular node) are in FAILED state. Even if the shutdown/rebooted node is brought up, failback is not happening if the portblock process is in FAILED state. Version-Release number of selected component (if applicable): nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: Consistent Steps to Reproduce: 1. Create 4 node ganesha cluster. 2. Reboot one of the node 3. Check pcs status Actual results: portblock process goes to FAILED state in pcs status. Expected results: All the process should be up and running. Additional info: [root@dhcp46-139 ~]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp46-124.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com 4 nodes and 24 resources configured Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com Failed Actions: * dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms * dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@dhcp46-139 ~]# ganesha log snippet: --------------------- Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records in ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records out ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0 bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records in ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records out ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0 bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for portblock RA of action=unblock) posted (#1) for review on master by soumya k (skoduri)
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for portblock RA of action=unblock) posted (#2) for review on master by soumya k (skoduri)
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for portblock RA of action=unblock) posted (#3) for review on master by soumya k (skoduri)
COMMIT: http://review.gluster.org/15947 committed in master by Kaleb KEITHLEY (kkeithle) ------ commit 1b2b5be970f78cc32069516fa347d9943dc17d3e Author: Soumya Koduri <skoduri> Date: Mon Nov 28 17:56:35 2016 +0530 common-HA: Increase timeout for portblock RA of action=unblock Portblock RA of action type unblock stores the information about the client/server IPs connection in tickle_dir folder created in the shared storage. In case of node shutdown/reboot there could be cases wherein shared_storage may become unavailable for sometime. Hence increase the timeout to avoid that resource agent going into FAILED state. Change-Id: I4f98f819895cb164c3a82ba8084c7c11610f35ff BUG: 1399154 Signed-off-by: Soumya Koduri <skoduri> Reviewed-on: http://review.gluster.org/15947 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Kaleb KEITHLEY <kkeithle> Reviewed-by: Niels de Vos <ndevos> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/