Description of problem: If a file is deleted on a client mount and a file with same name is created on `nufa' mount ENOENT errors are seen. However, creating a file with a new name works fine. Volume Name: nufa-1 Type: Distribute Volume ID: d12e0b94-72c5-4e67-8056-742ceb1c3490 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.37.72:/rhs/brick1/nufa-1 Brick2: 10.70.37.97:/rhs/brick1/nufa-1 Brick3: 10.70.37.124:/rhs/brick1/nufa-1 Brick4: 10.70.37.82:/rhs/brick1/nufa-1 Options Reconfigured: cluster.nufa: on Version-Release number of selected component (if applicable): glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42 How reproducible: Always Steps to Reproduce: 1. Create a 4 node distribute volume and set nufa on 2. Create some files from a client (The files are distributed across 4 nodes) 3. Create a mount on one of the servers (This will be the NUFA mount) 4. From the client remove any one file. 5. Try to create a file with same name on the NUFA mount, below error is seen root@boggs nufa-1]# dd if=/dev/zero of=fil1 bs=10M count=20 dd: opening `fil1': No such file or directory Actual results: ENOENT Expected results: Should be able to create a file with the same name. Additional info: Log snippet: ========================================================= [2013-08-15 10:55:23.698084] E [dht-helper.c:429:dht_subvol_get_hashed] (-->/usr/lib64/libglusterfs.so.0(default_lookup+0x6d) [0x3cb661be7d] (-->/usr/lib64/glusterfs/3.4.0.19rhs/xlator/cluster/nufa.so(nufa_lookup+0x90) [0x7f640dd07f80])) 1-nufa-1-dht: invalid argument: loc->parent [2013-08-15 10:55:23.699804] E [fuse-bridge.c:1162:fuse_getattr_resume] 0-glusterfs-fuse: 3390: GETATTR 140067732189852 (2677b206-57ad-46dc-a63e-66876bfb88e6) resolution failed [2013-08-15 11:03:08.168420] W [client-rpc-fops.c:519:client3_3_stat_cbk] 1-nufa-1-client-0: remote operation failed: No such file or directory [2013-08-15 11:03:08.169552] E [dht-helper.c:429:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.4.0.19rhs/xlator/cluster/distribute.so(dht_migration_complete_check_task+0x11e) [0x7f6414e6c7fe] (-->/usr/lib64/libglusterfs.so.0(syncop_lookup+0x19a) [0x3cb664b56a] (-->/usr/lib64/glusterfs/3.4.0.19rhs/xlator/cluster/nufa.so(nufa_lookup+0x90) [0x7f640dd07f80]))) 1-nufa-1-dht: invalid argument: loc->parent [2013-08-15 11:03:08.171443] W [fuse-bridge.c:1133:fuse_attr_cbk] 0-glusterfs-fuse: 3396: STAT() /fil5 => -1 (No such file or directory)
Can you please attach the sos-reports from the clients (and servers if possible)?
Created attachment 788737 [details] sosreports
Attaching sosreports. Volume looks like: Volume Name: nufa Type: Distribute Volume ID: a956aa02-befa-4a7c-bc5a-67a495bff7c6 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.37.72:/rhs/brick1/nufa-1 Brick2: 10.70.37.100:/rhs/brick1/nufa-2 Brick3: 10.70.37.124:/rhs/brick1/nufa-3 Brick4: 10.70.37.82:/rhs/brick1/nufa-4 sosreports are named after their corresponding ip-addresses.
Couple of observations: 1. Looks like if creation is done from client on which rm -rf was not issued, then it fails. 2. The failure seems to be in fuse_attr. dht_attr does not find the file (as it should). 3. A remount of client does fix the issue 4. Though the file is deleted, looks like the inode is still linked in the inode table Breakpoint 15, dht_stat (frame=0x7f88dccfeb98, this=0x1d0ca90, loc=0x7f88cc02ef50, xdata=0x0) at dht-inode-read.c:259 259 { (gdb) p *loc $22 = {path = 0x7f88cc007170 "/file-10", name = 0x0, inode = 0x7f88d3b1f6e0, parent = 0x0, gfid = "\254\213\222\027+>LT\211\240\217\355\331\025\307\v", pargfid = '\000' <repeats 15 times>} (gdb) p *loc->inode $24 = {table = 0x1db6f50, gfid = "\254\213\222\027+>LT\211\240\217\355\331\025\307\v", lock = 1, nlookup = 6, fd_count = 0, ref = 3, ia_type = IA_IFREG, fd_list = { next = 0x7f88d3b1f718, prev = 0x7f88d3b1f718}, dentry_list = { next = 0x7f88d387e320, prev = 0x7f88d387e320}, hash = { next = 0x7f88d38440c0, prev = 0x7f88d38440c0}, list = { next = 0x7f88d3b1f094, prev = 0x1db6fb0}, _ctx = 0x1dd4340}
Could we try checking if mounting the clients with --entry-timeout=0 and --attribute-timeout=0 fixes the issue?
Same results
Removing the 'blocker' flag as per discussion yesterday. NUFA supportability scope in Big Bend - Only supported when a client that is mounting a NUFA enabled volume is present within the trusted storage pool i.e. co-resident with a Red Hat Storage Server. - Only supported for the FUSE client - Only supported with one brick per server - When the local brick runs out of space or hits the cluster mindiskfree limit files will get distributed to other bricks in the same volume as long as there is space instead of returning ENOSPACE