Bug 1261765 - NFS Ganesha export lost during IO on EC volume
Summary: NFS Ganesha export lost during IO on EC volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.2
Assignee: Soumya Koduri
QA Contact: Neha
URL:
Whiteboard:
Depends On: 1262680
Blocks: 1260783
TreeView+ depends on / blocked
 
Reported: 2015-09-10 06:48 UTC by Bhaskarakiran
Modified: 2016-08-15 02:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-01 05:35:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Bhaskarakiran 2015-09-10 06:48:28 UTC
Description of problem:
======================

Created 8+4 ec volume, nfs4 mounted on the client and started IO (mkdir, dd ,linux untar's). After some time did a recursive listing with 'ls -ltR' and seen 'Remote IO error' and 'Stale file handle' errors. showmount on the server doesn't show the exported volume

[root@dhcp37-137 ~]# showmount -e localhost
Export list for localhost:
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# gluster v info ecvol
 
Volume Name: ecvol
Type: Disperse
Volume ID: 6066df12-794a-4106-9e6e-242c8e759397
Status: Started
Number of Bricks: 1 x (8 + 4) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.137:/rhs/brick1/brick0/ecvol1
Brick2: 10.70.37.150:/rhs/brick1/brick0/ecvol2
Brick3: 10.70.37.56:/rhs/brick1/brick0/ecvol3
Brick4: 10.70.37.100:/rhs/brick1/brick0/ecvol4
Brick5: 10.70.37.137:/rhs/brick1/brick1/ecvol5
Brick6: 10.70.37.150:/rhs/brick1/brick1/ecvol6
Brick7: 10.70.37.56:/rhs/brick1/brick1/ecvol7
Brick8: 10.70.37.100:/rhs/brick1/brick1/ecvol8
Brick9: 10.70.37.137:/rhs/brick1/brick2/ecvol9
Brick10: 10.70.37.150:/rhs/brick1/brick2/ecvol10
Brick11: 10.70.37.56:/rhs/brick1/brick2/ecvol11
Brick12: 10.70.37.100:/rhs/brick1/brick2/ecvol12
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
server.event-threads: 2
client.event-threads: 2
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.uss: off
nfs.disable: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# 
[root@dhcp37-137 ~]# pcs status
Cluster name: G1441293920.59
Last updated: Thu Sep 10 01:25:11 2015		Last change: Wed Sep  9 08:31:13 2015 by root via cibadmin on dhcp37-137.lab.eng.blr.redhat.com
Stack: corosync
Current DC: dhcp37-100.lab.eng.blr.redhat.com (version 1.1.13-a14efad) - partition with quorum
4 nodes and 16 resources configured

Online: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-150.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ]
 dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-137.lab.eng.blr.redhat.com
 dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-137.lab.eng.blr.redhat.com
 dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-56.lab.eng.blr.redhat.com
 dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-56.lab.eng.blr.redhat.com
 dhcp37-100.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-100.lab.eng.blr.redhat.com
 dhcp37-100.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-100.lab.eng.blr.redhat.com
 dhcp37-150.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp37-150.lab.eng.blr.redhat.com
 dhcp37-150.lab.eng.blr.redhat.com-trigger_ip-1	(ocf::heartbeat:Dummy):	Started dhcp37-150.lab.eng.blr.redhat.com

PCSD Status:
  dhcp37-137.lab.eng.blr.redhat.com: Online
  dhcp37-56.lab.eng.blr.redhat.com: Online
  dhcp37-100.lab.eng.blr.redhat.com: Online
  dhcp37-150.lab.eng.blr.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@dhcp37-137 ~]# 


Version-Release number of selected component (if applicable):
=============================================================

3.7.1-14

[root@dhcp37-137 ~]# gluster --version
glusterfs 3.7.1 built on Aug 31 2015 23:59:02
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
=================
100%

Steps to Reproduce:

As in description

Actual results:
===============
Export is lost


Expected results:


Additional info:
================

Comment 3 Soumya Koduri 2015-09-11 10:06:10 UTC
While collecting sosreports, please copy '/var/log/ganesha.log' and '/var/log/ganesha-gfapi.log' too to that folder. 
Also could you post the results you got while running these tests individually.

Comment 4 Soumya Koduri 2015-09-11 10:51:40 UTC
Along with the individual test results, please post the test results while using distributed-replicated volume too. Thanks!

Comment 8 Soumya Koduri 2015-10-12 11:52:01 UTC
QE had reported that while reproducing this issue, they had hit bug1262680. 
Request QE to reproduce the issue after the fix for bug1262680 is available.

Comment 9 Neha 2015-11-24 07:29:49 UTC
Still see 'Remote IO error' and 'stale file handle' errors. 

But don't see any issue with NFS Ganesha export. Tried it multiple times so moving it to verified, will re-open if hit it again.

Comment 11 errata-xmlrpc 2016-03-01 05:35:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.