Description of problem: Distributed volume was mounted using FUSE and NFS. from NFS mount create directories and files inside dir(using touch) in a loop. file creation for few files failed with error e.g. touch: cannot touch `56/f43': No such file or director issue 1 :- no xattr set for dir verified on sub-volumes :- All sub-volumes were up but xattr trusted.glusterfs.dht was not created for dir on any sub-volume [root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick4/d*/56 getfattr: Removing leading '/' from absolute path names # file: rhs/brick4/d1/56 trusted.gfid=0x0e2543045dbe4de1b476d9c69bea6d2b # file: rhs/brick4/d2/56 trusted.gfid=0x0e2543045dbe4de1b476d9c69bea6d2b # file: rhs/brick4/d3/56 trusted.gfid=0x0e2543045dbe4de1b476d9c69bea6d2b # file: rhs/brick4/d4/56 trusted.gfid=0x0e2543045dbe4de1b476d9c69bea6d2b - mount log does not have any error for the same and brick log has following 'INFO' not an 'ERROR' [2013-11-14 06:02:43.355968] I [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-dht-server: 116043: SETXATTR (null) (acc3f3cd-a42c-411d-9584-e893500a6b8c) ==> trusted.glusterfs.dht [2013-11-14 06:02:43.356012] I [server-rpc-fops.c:924:server_setxattr_cbk] 0-dht-server: No such file or directory [2013-11-14 06:02:43.791001] I [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-dht-server: 116246: SETXATTR (null) (acbd613c-90fd-465f-8d68-f3148204d18b) ==> trusted.glusterfs.dht [2013-11-14 06:02:43.791044] I [server-rpc-fops.c:924:server_setxattr_cbk] 0-dht-server: No such file or directory Issue 2:- verfied nfs log to find reason for file creation failures and found [2013-11-14 06:02:44.036466] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_create+0x37b) [0x7f5a1a1ab3bb] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f5a19d722f0] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f5a19b28379]))) 0-fd: fd is NULL [2013-11-14 06:02:44.036488] W [nfs3.c:2373:nfs3svc_create_cbk] 0-nfs: 726db823: <gfid:0e254304-5dbe-4de1-b476-d9c69bea6d2b>/f47 => -1 (No such file or directory) [2013-11-14 06:02:44.036527] W [nfs3-helpers.c:3464:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 726db823, CREATE: NFS: 2(No such file or directory), POSIX: 2(No such file or directory), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2013-11-14 06:02:44.036806] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 3236861000 [2013-11-14 06:02:44.038268] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 946349490 [2013-11-14 06:02:44.039651] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 946349490 [2013-11-14 06:02:44.039711] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_create+0x37b) [0x7f5a1a1ab3bb] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f5a19d722f0] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f5a19b28379]))) 0-fd: fd is NULL [2013-11-14 06:02:44.039725] W [nfs3.c:2373:nfs3svc_create_cbk] 0-nfs: 756db823: <gfid:0e254304-5dbe-4de1-b476-d9c69bea6d2b>/f48 => -1 (No such file or directory) [2013-11-14 06:02:44.039746] W [nfs3-helpers.c:3464:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 756db823, CREATE: NFS: 2(No such file or directory), POSIX: 2(No such file or directory), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2013-11-14 06:02:44.040045] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 946349490 [2013-11-14 06:02:44.042513] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 2094207367 [2013-11-14 06:02:44.049313] W [dht-layout.c:179:dht_layout_search] 0-dht-dht: no subvolume for hash (value) = 2094207367 [2013-11-14 06:02:44.049375] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_create+0x37b) [0x7f5a1a1ab3bb] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f5a19d722f0] (-->/usr/lib64/glusterfs/3.4.0.44rhs/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f5a19b28379]))) 0-fd: fd is NULL Version-Release number of selected component (if applicable): 3.4.0.44rhs-1.el6rhs.x86_64 How reproducible: found more than once but don't know exact steps to reproduce Steps to Reproduce: 1. while creating files and directories in loop from NFS mount, got this error Actual results: Expected results: Additional info: lookup on parent Directory or Directory it self has healed xattr. (after lookup from mount point, xattr was created)
Patch at: https://code.engineering.redhat.com/gerrit/#/c/27158/
exact steps to reproduce are known so updating the bug for the same Steps to Reproduce: =================== 1. create Distributed volume and mount it on multiple client(NFS) 2. from one client start creating Directories. 3. From another mount point create files inside those Directory and make sure File creation request is send when parent Directory is created on all/more than one sub-volume but before hash layout is assigned(i.e. trusted.glusterfs.dht) (it's race condition so can be achieved by putting creation in loop or add breakpoint before it creates layout and send creation from another mount point) 4. file creation would fail with error as layout is not present
verified with 3.6.0.19-1.el6rhs.x86_64 there is some race condition but unable to figure out. Tried 5-7 times and got this error only once. Steps followed :- 1. create Distributed volume and mount it on multiple client(NFS & FUSE) 2. from one client - FUSE start creating Directory mkdir dir2 3. From another mount point - NFS mount, create files inside that Directory and make sure File creation request is send when parent Directory is created on all/more than one sub-volume but before hash layout is assigned(i.e. trusted.glusterfs.dht) [root@OVM3 nfs]# touch dir2/f1 touch: cannot touch `dir2/f1': Stale file handle [root@OVM3 nfs]# touch dir2/a touch: cannot touch `dir2/a': No such file or directory Hence moving bug back to ASSIGNED log snippet:- [2014-06-27 07:41:06.643011] W [nfs3-helpers.c:3470:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 285db38b, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2014-06-27 07:41:06.644619] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 3551819610 [2014-06-27 07:41:06.645413] W [nfs3.c:1230:nfs3svc_lookup_cbk] 0-nfs: 2b5db38b: <gfid:c0a48017-ec23-4c93-b6bd-311a8a814ae8>/f1 => -1 (Stale file handle) [2014-06-27 07:41:06.645450] W [nfs3-helpers.c:3470:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 2b5db38b, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000 [2014-06-27 07:41:20.424140] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 974644454 [2014-06-27 07:41:20.425465] W [dht-layout.c:180:dht_layout_search] 0-snap-dht: no subvolume for hash (value) = 974644454 [2014-06-27 07:41:20.425747] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/cluster/distribute.so(dht_create+0x393) [0x7fc53757e203] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/debug/io-stats.so(io_stats_create_cbk+0x27d) [0x7fc53713e9ed] (-->/usr/lib64/glusterfs/3.6.0.19/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7fc536eee3c9]))) 0-fd: fd is NULL [2014-06-27 07:41:20.425769] W [nfs3.c:2370:nfs3svc_create_cbk] 0-nfs: 2f5db38b: <gfid:c0a48017-ec23-4c93-b6bd-311a8a814ae8>/a => -1 (No such file or directory)
Rachana, I need answers to following questions: 1. Is it guaranteed from the perspective of application that creation of directory dir2 was successful, before you start creating files within it? This guarantee can be obtained either by: * on fuse mount mkdir completes successfully and you start creating files on nfs mount _after_ you get that confirmation. * on nfs mount, you attempt creating directory dir2 and this directory creation either succeeds or fails with EEXIST and then you start creating files within dir2. If only we guarantee that directory creation of dir2 was successful from application perspective, we can treat this as bug. There are many internal states within dht where you can run into this issue, but the question is whether these internal states are hit when the application is trying to do any legal operations (like the ones explained above). regards, Raghavendra.
Cloning this to 3.1. To be fixed in future release.