Description of problem: ----------------------- 4 Node Ganesha Cluster. 4 clients,1:1 mount. Mount vers=3. Ran iozone seq writes on a fresh setup. Test hangs forever. iostat showed absolutely no signs of running I/O on the servers. Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-ganesha-3.8.4-7.el6rhs.x86_64 nfs-ganesha-2.4.1-2.el6rhs.x86_64 How reproducible: ---------------- 2/2 on freshly installed setups. Steps to Reproduce: ------------------- 1. Create a 4 node Ganesha cluster(RHEL 6.8).Mount a 2*2 volume on 4 RHEL 7.3 clients via v3. 2. Run iozone seq writes. Actual results: --------------- iozone threads hang after a few minutes. Expected results: ----------------- No hangs. Additional info: ---------------- Server OS : RHEL 6.8 Client OS : RHEL 7.3 *Vol Config* : Volume Name: testvol Type: Distributed-Replicate Volume ID: c43082bb-e807-46b8-8e07-c8eae54eec21 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on performance.stat-prefetch: off server.allow-insecure: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas005 /]#
A slight change in the description. How reproducible ---> 2/3
Ganesha was alive and running at all times. pcs status after running(during hang) : [root@gqas005 /]# pcs status Cluster name: G1474623742.03 Last updated: Sun Dec 11 05:15:36 2016 Last change: Sun Dec 11 04:22:38 2016 by root via crm_attribute on gqas011.sbu.lab.eng.bos.redhat.com Stack: cman Current DC: gqas005.sbu.lab.eng.bos.redhat.com (version 1.1.14-8.el6-70404b0) - partition WITHOUT quorum 4 nodes and 24 resources configured Online: [ gqas005.sbu.lab.eng.bos.redhat.com ] OFFLINE: [ gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ] Resource Group: gqas013.sbu.lab.eng.bos.redhat.com-group gqas013.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped gqas013.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: gqas005.sbu.lab.eng.bos.redhat.com-group gqas005.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped gqas005.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas005.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: gqas006.sbu.lab.eng.bos.redhat.com-group gqas006.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped gqas006.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas006.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Resource Group: gqas011.sbu.lab.eng.bos.redhat.com-group gqas011.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Stopped gqas011.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas011.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Stopped Failed Actions: * nfs-mon_monitor_10000 on gqas005.sbu.lab.eng.bos.redhat.com 'unknown error' (1): call=12, status=Timed Out, exitreason='none', last-rc-change='Sun Dec 11 04:24:18 2016', queued=0ms, exec=0ms PCSD Status: gqas013.sbu.lab.eng.bos.redhat.com: Online gqas005.sbu.lab.eng.bos.redhat.com: Online gqas006.sbu.lab.eng.bos.redhat.com: Online gqas011.sbu.lab.eng.bos.redhat.com: Online [root@gqas005 /]#
upstream mainline patch http://review.gluster.org/16122 posted for review.
upstream mainline http://review.gluster.org/16122 release-3.9 : http://review.gluster.org/16139 release-3.8 : http://review.gluster.org/16140 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93080
Verifed on glusterfs-3.8.4-11 and Ganesha 2.4.1-4. Could not reproduce the reported issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0484.html