| Summary: | After ganesha node reboot/shutdown, portblock process goes to FAILED state | |||
|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Arthy Loganathan <aloganat> | |
| Component: | nfs-ganesha | Assignee: | Soumya Koduri <skoduri> | |
| Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.2 | CC: | aloganat, amukherj, jthottan, oalbrigt, rhs-bugs, rnalakka, sbhaloth, skoduri, storage-qa-internal | |
| Target Milestone: | --- | |||
| Target Release: | RHGS 3.2.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8.4-7 | Doc Type: | Known Issue | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1399154 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-23 05:50:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1399154, 1400546 | |||
| Bug Blocks: | 1351528 | |||
I have tried rebooting the nodes(different nodes each time) in which shared_storage bricks are present. Have not seen the issue(5/5 times) in 10.70.46.42 cluster setup. Thanks Oyvind and Arthy. Posted fix upstream to increase timeout of unblock RA to 60s during creation. http://review.gluster.org/15947 upstream mainline : http://review.gluster.org/15947 upstream 3.9 : http://review.gluster.org/15994 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/91871/ Portblock resource agent comes back to Started State after node reboots/shutdown. Verified the fix in build, glusterfs-ganesha-3.8.4-7.el7rhgs.x86_64 nfs-ganesha-2.4.1-2.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-2.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |
Description of problem: After ganesha node reboot, portblock process goes to FAILED state. In a four node cluster, if one of the node gets rebooted/shutdown, portblock process of any of the nodes(not particular node) are in FAILED state. Even if the shutdown/rebooted node is brought up, failback is not happening if the portblock process is in FAILED state. Version-Release number of selected component (if applicable): nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: Consistent Steps to Reproduce: 1. Create 4 node ganesha cluster. 2. Reboot one of the node 3. Check pcs status Actual results: portblock process goes to FAILED state in pcs status. Expected results: All the process should be up and running. Additional info: [root@dhcp46-139 ~]# pcs status Cluster name: ganesha-ha-360 Stack: corosync Current DC: dhcp46-124.lab.eng.blr.redhat.com (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19 2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com 4 nodes and 24 resources configured Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ] Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ] Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com (blocked) Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-115.lab.eng.blr.redhat.com dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-139.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp46-124.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com Failed Actions: * dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms * dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms * dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed Out, exitreason='none', last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@dhcp46-139 ~]# ganesha log snippet: --------------------- Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records in ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0 records out ] Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0 bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records in ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0 records out ] Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice: dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0 bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]