Bug 1261765

Summary: NFS Ganesha export lost during IO on EC volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Bhaskarakiran <byarlaga>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Neha <nerawat>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: akhakhar, byarlaga, jthottan, kkeithle, mzywusko, ndevos, nlevinki, sankarshan, sashinde, skoduri
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-01 05:35:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1262680    
Bug Blocks: 1260783    

Description Bhaskarakiran 2015-09-10 06:48:28 UTC
Description of problem:
======================

Created 8+4 ec volume, nfs4 mounted on the client and started IO (mkdir, dd ,linux untar's). After some time did a recursive listing with 'ls -ltR' and seen 'Remote IO error' and 'Stale file handle' errors. showmount on the server doesn't show the exported volume

[root@dhcp37-137 ~]# showmount -e localhost
Export list for localhost:
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# gluster v info ecvol
 
Volume Name: ecvol
Type: Disperse
Volume ID: 6066df12-794a-4106-9e6e-242c8e759397
Status: Started
Number of Bricks: 1 x (8 + 4) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.137:/rhs/brick1/brick0/ecvol1
Brick2: 10.70.37.150:/rhs/brick1/brick0/ecvol2
Brick3: 10.70.37.56:/rhs/brick1/brick0/ecvol3
Brick4: 10.70.37.100:/rhs/brick1/brick0/ecvol4
Brick5: 10.70.37.137:/rhs/brick1/brick1/ecvol5
Brick6: 10.70.37.150:/rhs/brick1/brick1/ecvol6
Brick7: 10.70.37.56:/rhs/brick1/brick1/ecvol7
Brick8: 10.70.37.100:/rhs/brick1/brick1/ecvol8
Brick9: 10.70.37.137:/rhs/brick1/brick2/ecvol9
Brick10: 10.70.37.150:/rhs/brick1/brick2/ecvol10
Brick11: 10.70.37.56:/rhs/brick1/brick2/ecvol11
Brick12: 10.70.37.100:/rhs/brick1/brick2/ecvol12
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
server.event-threads: 2
client.event-threads: 2
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.uss: off
nfs.disable: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# pcs status
Cluster name: G1441293920.59
Last updated: Thu Sep 10 01:25:11 2015		Last change: Wed Sep  9 08:31:13 2015 by root via cibadmin on dhcp37-137.lab.eng.blr.redhat.com
Stack: corosync
Current DC: dhcp37-100.lab.eng.blr.redhat.com (version 1.1.13-a14efad) - partition with quorum
4 nodes and 16 resources configured

Online: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]
 dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-137.lab.eng.blr.redhat.com
 dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-137.lab.eng.blr.redhat.com
 dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-56.lab.eng.blr.redhat.com
 dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-56.lab.eng.blr.redhat.com
 dhcp37-100.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-100.lab.eng.blr.redhat.com
 dhcp37-100.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-100.lab.eng.blr.redhat.com
 dhcp37-150.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-150.lab.eng.blr.redhat.com
 dhcp37-150.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-150.lab.eng.blr.redhat.com

PCSD Status:
  dhcp37-137.lab.eng.blr.redhat.com: Online
  dhcp37-56.lab.eng.blr.redhat.com: Online
  dhcp37-100.lab.eng.blr.redhat.com: Online
  dhcp37-150.lab.eng.blr.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@dhcp37-137 ~]# 


Version-Release number of selected component (if applicable):
=============================================================

3.7.1-14

[root@dhcp37-137 ~]# gluster --version
glusterfs 3.7.1 built on Aug 31 2015 23:59:02
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
=================
100%

Steps to Reproduce:

As in description

Actual results:
===============
Export is lost


Expected results:


Additional info:
================

Comment 3 Soumya Koduri 2015-09-11 10:06:10 UTC
While collecting sosreports, please copy '/var/log/ganesha.log' and '/var/log/ganesha-gfapi.log' too to that folder. 
Also could you post the results you got while running these tests individually.

Comment 4 Soumya Koduri 2015-09-11 10:51:40 UTC
Along with the individual test results, please post the test results while using distributed-replicated volume too. Thanks!

Comment 8 Soumya Koduri 2015-10-12 11:52:01 UTC
QE had reported that while reproducing this issue, they had hit bug1262680. 
Request QE to reproduce the issue after the fix for bug1262680 is available.

Comment 9 Neha 2015-11-24 07:29:49 UTC
Still see 'Remote IO error' and 'stale file handle' errors. 

But don't see any issue with NFS Ganesha export. Tried it multiple times so moving it to verified, will re-open if hit it again.

Comment 11 errata-xmlrpc 2016-03-01 05:35:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html