Hide Forgot
Created attachment 2 [details] Test Attachment
The generic problem here is the absence of an inode_lookup call for the inodes that are created or looked up in libglusterfsclient. This results in the inode for the touch'ed file being purged and subsequent lookups being served from a different subvolume by the replicate translator. A simple solution is to un-comment the inode_lookup calls in libglusterfsclient and also add more of them in the relevant places. Patch coming soon.
On a unfs3-exported mount point, the NFSv3 client receives an ESTALE on the following operations: [root@client02 shehjart]# mount client03:/testpath -o wsize=65536 mount [root@client02 shehjart]# ls mount [root@client02 shehjart]# touch mount/test touch: setting times of `mount/test': Stale NFS file handle The unfs3 exports file contains: /testpath 192.168.101.0/24(rw,no_root_squash) Where /testpath is the mount-path specified in the accompanying FSTAB file. This file looks like: /data/shehjart/dist-repl.vol /testpath glusterfs subvolume=repl1,logfile=/data/shehjart/booster.log,loglevel=DEBUG,attr_timeout=0 The "dist-repl.vol" is attached. On the 2 bricks being used for subvolume repl1, the attached "posix-locks-iot-srv.vol" is used. unfs3 version being used is the unfs3-0.9.23booster0.1. unfs3 is started using the commands: [root@client03 shehjart]# export GLUSTERFS_BOOSTER_FSTAB=$(pwd)/booster.fstab [root@client03 shehjart]# LD_PRELOAD=/data/shehjart/glusterfsd/lib/glusterfs /glusterfs-booster.so /data/shehjart/unfsd/sbin/unfsd -e /data/shehjart/exports -d UNFS3 unfsd 0.9.23 (C) 2009, Pascal Schmidt <unfs3-server> /testpath/: ip 192.168.101.0 mask 255.255.255.0 options 5 unfs3 source was changed to add a few printfs to instrument the fh cache code. These prints statements result in the output below, which clearly show the problem. ADDING: 2065, 1, /testpath/ Fstat done from create ADDING: 2065, 854163463, /testpath//test #File added to cache on touch. LOOKUP: 2065, 854163463 #File looked up on a subsequent NFS op dev,ino relation does not hold, 2065, 1663107078 #Subsequent lstat on the same file returns a different inode number. Returns ESTALE here. ADDING: 2065, 1663107078, /testpath//test #On seeing a different inode number, unfs3 tries to add new inode number to fh cache. ADDING: 2065, 427081731, /testpath//test #On an ls, unfs3 sees yet another inode number for the same file. LOOKUP: 2065, 427081731 dev,ino relation does not hold, 2065, 1663107078 ADDING: 2065, 831553539, /testpath//test
Link to approved patch: http://patches.gluster.com/patch/562/
We've known that the real reason why we needed the inodes to stick around(..by doing an inode_lookup in libglusterfsclient..) was due to replicate returning different inode numbers on a revalidate or a fresh lookup. That problem now being fixed by a patch submitted for bug 761848, I've a feeling these inode_lookups could be behind the increasing memory usage for a unfsd that I've been observing for the last few hours. I am re-opening this till verified otherwise.
I'll be doing memory leak tests separately. That deserves a separate and comprehensive bug report for itself. This is being closed.