Bug 1396776

Summary: [Ganesha] : "Stale File Handle" while running rm from multiple clients.
Product: Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED NOTABUG QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: asoman, bturner, jthottan, kkeithle, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-22 14:46:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ambarish 2016-11-20 05:31:27 UTC
Description of problem:
----------------------

4 node Ganesha cluster.Mounted a 2*2 volume on 4 clients via v3 and created a huge data set.Then ran rm -rf <mountpoint>/* from all the clients.

*Observation* :

On the application side :
------------------------

rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/devicetree/bindings/extcon/extcon-arizona.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/devicetree/bindings/extcon/extcon-rt8973a.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/cgroups.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/net_prio.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/hugetlb.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/freezer-subsystem.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/pids.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/cpusets.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/memory.txt’: Stale file handle

In Ganesha-gfapi logs
----------------------

[2016-11-20 03:23:50.160966] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-testvol-dht: acquiring inodelk failed rmdir for /file_dstdir/gqac015.sbu.lab.eng.bos.redhat.com/thrd_04/d_004/d_007) [Stale file handle]
[2016-11-20 03:23:50.162594] W [MSGID: 104011] [glfs-handleops.c:1310:pub_glfs_h_create_from_handle] 0-meta-autoload: inode refresh of ac7131d7-0bb7-440f-9000-289b631ac66d failed: Stale file handle [Stale file handle]
[2016-11-20 03:23:50.160966] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-testvol-dht: acquiring inodelk failed rmdir for /file_dstdir/gqac015.sbu.lab.eng.bos.redhat.com/thrd_04/d_004/d_007) [Stale file handle]
[2016-11-20 03:23:50.162594] W [MSGID: 104011] [glfs-handleops.c:1310:pub_glfs_h_create_from_handle] 0-meta-autoload: inode refresh of ac7131d7-0bb7-440f-9000-289b631ac66d failed: Stale file handle [Stale file handle] 


The logs are literally flooded with these messages,but I did not see THESE many number of errors on the client side.This can be misleading and can cause FUD in production.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-2.4.1-1.el7rhgs.x86_64

How reproducible:
-----------------

2/2

Steps to Reproduce:
-------------------

1. Create a 2*2 volume,mount it via v3 on multiple clients.

2. Create a huge data set with deep directories

3. Run rm -rf from various clients.

Actual results:
-----------------

A. Stale File handle on the application side.
B . Logs contain more error messages/warning entries than seen on the application side,which is a bit misleading IMO.

Expected results:
-----------------

Successful rm -rf on client side.

Additional info:
----------------

*Vol Conf* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 865c5329-7fa5-4a10-888b-671902b0bca6
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.stat-prefetch: off
server.allow-insecure: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 ~]#