1396776 – [Ganesha] : "Stale File Handle" while running rm from multiple clients.

Bug 1396776 - [Ganesha] : "Stale File Handle" while running rm from multiple clients.

Summary: [Ganesha] : "Stale File Handle" while running rm from multiple clients.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kaleb KEITHLEY
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-20 05:31 UTC by Ambarish
Modified:	2016-11-22 14:46 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-22 14:46:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ambarish 2016-11-20 05:31:27 UTC

Description of problem:
----------------------

4 node Ganesha cluster.Mounted a 2*2 volume on 4 clients via v3 and created a huge data set.Then ran rm -rf <mountpoint>/* from all the clients.

*Observation* :

On the application side :
------------------------

rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/devicetree/bindings/extcon/extcon-arizona.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/devicetree/bindings/extcon/extcon-rt8973a.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/cgroups.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/net_prio.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/hugetlb.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/freezer-subsystem.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/pids.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/cpusets.txt’: Stale file handle
rm: cannot remove ‘/gluster-mount/linux-4.8.9/Documentation/cgroup-v1/memory.txt’: Stale file handle

In Ganesha-gfapi logs
----------------------

[2016-11-20 03:23:50.160966] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-testvol-dht: acquiring inodelk failed rmdir for /file_dstdir/gqac015.sbu.lab.eng.bos.redhat.com/thrd_04/d_004/d_007) [Stale file handle]
[2016-11-20 03:23:50.162594] W [MSGID: 104011] [glfs-handleops.c:1310:pub_glfs_h_create_from_handle] 0-meta-autoload: inode refresh of ac7131d7-0bb7-440f-9000-289b631ac66d failed: Stale file handle [Stale file handle]
[2016-11-20 03:23:50.160966] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-testvol-dht: acquiring inodelk failed rmdir for /file_dstdir/gqac015.sbu.lab.eng.bos.redhat.com/thrd_04/d_004/d_007) [Stale file handle]
[2016-11-20 03:23:50.162594] W [MSGID: 104011] [glfs-handleops.c:1310:pub_glfs_h_create_from_handle] 0-meta-autoload: inode refresh of ac7131d7-0bb7-440f-9000-289b631ac66d failed: Stale file handle [Stale file handle] 


The logs are literally flooded with these messages,but I did not see THESE many number of errors on the client side.This can be misleading and can cause FUD in production.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-2.4.1-1.el7rhgs.x86_64

How reproducible:
-----------------

2/2

Steps to Reproduce:
-------------------

1. Create a 2*2 volume,mount it via v3 on multiple clients.

2. Create a huge data set with deep directories

3. Run rm -rf from various clients.

Actual results:
-----------------

A. Stale File handle on the application side.
B . Logs contain more error messages/warning entries than seen on the application side,which is a bit misleading IMO.

Expected results:
-----------------

Successful rm -rf on client side.

Additional info:
----------------

*Vol Conf* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 865c5329-7fa5-4a10-888b-671902b0bca6
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.stat-prefetch: off
server.allow-insecure: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 ~]#

Note You need to log in before you can comment on or make changes to this bug.