Bug 761743 (GLUSTER-11)

Summary: unfs3 returns stale NFS handle
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: libglusterfsclientAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: urgent    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
booster vol file
none
bricks vol file none

Description Shehjar Tikoo 2009-06-12 07:33:35 UTC
Created attachment 2 [details]
Test Attachment

Comment 1 Shehjar Tikoo 2009-06-12 07:37:52 UTC
The generic problem here is the absence of an inode_lookup call for the inodes that are created or looked up in libglusterfsclient. This results in the inode for the touch'ed file being purged and subsequent lookups being served from a different subvolume by the replicate translator.

A simple solution is to un-comment the inode_lookup calls in libglusterfsclient and also add more of them in the relevant places.

Patch coming soon.

Comment 2 Shehjar Tikoo 2009-06-12 10:32:44 UTC
On a unfs3-exported mount point, the NFSv3 client receives an ESTALE on the following operations:

[root@client02 shehjart]# mount client03:/testpath -o wsize=65536 mount
[root@client02 shehjart]# ls mount
[root@client02 shehjart]# touch mount/test
touch: setting times of `mount/test': Stale NFS file handle

The unfs3 exports file contains:
/testpath 192.168.101.0/24(rw,no_root_squash)

Where /testpath is the mount-path specified in the accompanying FSTAB file.

This file looks like:
/data/shehjart/dist-repl.vol /testpath glusterfs subvolume=repl1,logfile=/data/shehjart/booster.log,loglevel=DEBUG,attr_timeout=0

The "dist-repl.vol" is attached.

On the 2 bricks being used for subvolume repl1, the attached "posix-locks-iot-srv.vol" is used.

unfs3 version being used is the unfs3-0.9.23booster0.1.

unfs3 is started using the commands:

[root@client03 shehjart]# export GLUSTERFS_BOOSTER_FSTAB=$(pwd)/booster.fstab
[root@client03 shehjart]# LD_PRELOAD=/data/shehjart/glusterfsd/lib/glusterfs /glusterfs-booster.so /data/shehjart/unfsd/sbin/unfsd -e /data/shehjart/exports -d
UNFS3 unfsd 0.9.23 (C) 2009, Pascal Schmidt <unfs3-server>
/testpath/: ip 192.168.101.0 mask 255.255.255.0 options 5


unfs3 source was changed to add a few printfs to instrument the fh cache code. These prints statements result in the output below, which clearly show the problem.

ADDING: 2065, 1, /testpath/
Fstat done from create
ADDING: 2065, 854163463, /testpath//test #File added to cache on touch.
LOOKUP: 2065, 854163463                  #File looked up on a subsequent NFS op
dev,ino relation does not hold, 2065, 1663107078 #Subsequent lstat on the same file returns a different inode number. Returns ESTALE here.
ADDING: 2065, 1663107078, /testpath//test #On seeing a different inode number, unfs3 tries to add new inode number to fh cache.
ADDING: 2065, 427081731, /testpath//test #On an ls, unfs3 sees yet another inode number for the same file.
LOOKUP: 2065, 427081731
dev,ino relation does not hold, 2065, 1663107078
ADDING: 2065, 831553539, /testpath//test

Comment 3 Shehjar Tikoo 2009-06-15 10:09:26 UTC
Link to approved patch: http://patches.gluster.com/patch/562/

Comment 4 Shehjar Tikoo 2009-07-23 15:54:17 UTC
We've known that the real reason why we needed the inodes to stick around(..by doing an inode_lookup in libglusterfsclient..) was due to replicate returning different inode numbers on a revalidate or a fresh lookup. That problem now being fixed by a patch submitted for bug 761848, I've a feeling these inode_lookups could be behind the increasing memory usage for a unfsd that I've been observing for the last few hours.

I am re-opening this till verified otherwise.

Comment 5 Shehjar Tikoo 2009-07-29 10:36:36 UTC
I'll be doing memory leak tests separately. That deserves a separate and comprehensive bug report for itself. This is being closed.