DescriptionArthy Loganathan
2016-11-24 11:09:02 UTC
Description of problem:
After ganesha node reboot, portblock process goes to FAILED state.
In a four node cluster, if one of the node gets rebooted/shutdown, portblock process of any of the nodes(not particular node) are in FAILED state.
Even if the shutdown/rebooted node is brought up, failback is not happening if
the portblock process is in FAILED state.
Version-Release number of selected component (if applicable):
nfs-ganesha-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
How reproducible:
Consistent
Steps to Reproduce:
1. Create 4 node ganesha cluster.
2. Reboot one of the node
3. Check pcs status
Actual results:
portblock process goes to FAILED state in pcs status.
Expected results:
All the process should be up and running.
Additional info:
[root@dhcp46-139 ~]# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp46-124.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com
4 nodes and 24 resources configured
Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group
dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com (blocked)
Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group
dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-115.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com
Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group
dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com
Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group
dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
Failed Actions:
* dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms
* dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms
* dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms
* dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@dhcp46-139 ~]#
ganesha log snippet:
---------------------
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records in ]
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records out ]
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0 bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records in ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records out ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0 bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]
Comment 11Arthy Loganathan
2016-11-28 10:30:41 UTC
I have tried rebooting the nodes(different nodes each time) in which shared_storage bricks are present. Have not seen the issue(5/5 times) in 10.70.46.42 cluster setup.
Comment 17Arthy Loganathan
2016-12-08 07:01:24 UTC
Portblock resource agent comes back to Started State after node reboots/shutdown.
Verified the fix in build,
glusterfs-ganesha-3.8.4-7.el7rhgs.x86_64
nfs-ganesha-2.4.1-2.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHSA-2017-0486.html
Description of problem: After ganesha node reboot, portblock process goes to FAILED state. In a four node cluster, if one of the node gets rebooted/shutdown, portblock process of any of the nodes(not particular node) are in FAILED state. Even if the shutdown/rebooted node is brought up, failback is not happening if the portblock process is in FAILED state. Version-Release number of selected component (if applicable): nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: Consistent Steps to Reproduce: 1. Create 4 node ganesha cluster. 2. Reboot one of the node 3. Check pcs status Actual results: portblock process goes to FAILED state in pcs status. Expected results: All the process should be up and running. Additional info: [root@dhcp46-139 ~]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp46-124.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com 4 nodes and 24 resources configured Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com Failed Actions: * dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms * dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@dhcp46-139 ~]# ganesha log snippet: --------------------- Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records in ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records out ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0 bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records in ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records out ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0 bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]