Description of problem: stat on a dir fails with ENOENT when rebalance is issued. ls -l gives following output. [root@RHEL6 mnt]# ls -lrt ls: cannot access hosa_dir: No such file or directory total 66816 ??????????? ? ? ? ? ? hosa_dir -rw-r--r--. 1 root root 96694860 Aug 15 21:45 linux-3.0.2.tar.gz -rw-r--r--. 1 root root 2540052 Jan 24 03:02 glusterfs-3.3.0qa20.tar.gz drwxrwxr-x. 23 root root 669 Jan 31 10:08 linux-3.0.2 drwxrwxr-x. 11 1031 wheel 853 Jan 31 10:10 glusterfs-3.3.0qa20 [root@RHEL6 mnt]# Version-Release number of selected component (if applicable): glusterfs-3.3.0qa20 How reproducible: 1/1 Steps to Reproduce: 1. Created and started a stripe-replicated volume. 2. Untarred glusterfs and linux kernel source on the mountpoint. 3. created a dir 'hosa_dir' on mountpoint and used the following command to rename a file 10000 times. touch 1 && for i in `seq 1 10000`; do j=$(( i + 1)); mv $i $j; ls; done 4. From mountpoint started removing all the files from linux source with following command. find linux-3.0.2 -type f | xargs rm -f 5. Now added 4 more bricks to make it dist-stripe-replicate. 6. started rebalance. Actual results: find failed with following error. [root@RHEL6 mnt]# find linux-3.0.2 -type f | xargs rm -f find: `linux-3.0.2/arch/arm/mach-mxs/devices/Kconfig': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/Makefile': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/amba-duart.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-auart.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-dma.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-fec.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-flexcan.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-mxs-i2c.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-mxs-mmc.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-mxs-pwm.c': No such file or directory find: `linux-3.0.2/arch/arm/mach-mxs/devices/platform-mxsfb.c': No such file or directory and ls -lrt on the mountpoint gave following output. [root@RHEL6 mnt]# ls -lrt ls: cannot access hosa_dir: No such file or directory total 66816 ??????????? ? ? ? ? ? hosa_dir -rw-r--r--. 1 root root 96694860 Aug 15 21:45 linux-3.0.2.tar.gz -rw-r--r--. 1 root root 2540052 Jan 24 03:02 glusterfs-3.3.0qa20.tar.gz drwxrwxr-x. 23 root root 669 Jan 31 10:08 linux-3.0.2 drwxrwxr-x. 11 1031 wheel 853 Jan 31 10:10 glusterfs-3.3.0qa20 The last dir glusterfs-3.3.0qa20 has uid set to 1031, which doesn't exist. Additional info: log entries in fuse client. [2012-01-31 10:08:27.343373] W [client3_1-fops.c:1312:client3_1_entrylk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.349250] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.351229] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.373082] W [client3_1-fops.c:1312:client3_1_entrylk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.375642] W [client3_1-fops.c:1312:client3_1_entrylk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.377702] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.379300] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 1-hosdu-client-7: remote operation failed: No such file or directory [2012-01-31 10:08:27.796676] I [afr-common.c:1302:afr_launch_self_heal] 1-hosdu-replicate-2: background entry self-heal triggered. path: /, reason: lookup detected pending operations [2012-01-31 10:08:27.800562] I [afr-self-heal-entry.c:2145:afr_sh_entry_fix] 1-hosdu-replicate-2: /: Performing conservative merge [2012-01-31 10:08:27.808168] I [afr-self-heal-common.c:2014:afr_self_heal_completion_cbk] 1-hosdu-replicate-2: background entry self-heal completed on / [2012-01-31 10:08:27.951829] I [afr-common.c:1302:afr_launch_self_heal] 1-hosdu-replicate-2: background meta-data entry self-heal triggered. path: /linux-3.0.2, reason: lookup detected pending operations [2012-01-31 10:08:27.960169] I [afr-self-heal-entry.c:2145:afr_sh_entry_fix] 1-hosdu-replicate-2: /linux-3.0.2: Performing conservative merge [2012-01-31 10:08:27.966739] I [afr-self-heal-common.c:2014:afr_self_heal_completion_cbk] 1-hosdu-replicate-2: background meta-data entry self-heal completed on /linux-3.0.2 [2012-01-31 10:08:27.975335] I [afr-common.c:1302:afr_launch_self_heal] 1-hosdu-replicate-2: background meta-data entry self-heal triggered. path: /linux-3.0.2/arch, reason: lookup detected pending operations [2012-01-31 10:08:27.983807] I [afr-self-heal-entry.c:2145:afr_sh_entry_fix] 1-hosdu-replicate-2: /linux-3.0.2/arch: Performing conservative merge [2012-01-31 10:08:27.991216] I [afr-self-heal-common.c:2014:afr_self_heal_completion_cbk] 1-hosdu-replicate-2: background meta-data entry self-heal completed on /linux-3.0.2/arch [2012-01-31 10:08:28.012626] I [afr-common.c:1302:afr_launch_self_heal] 1-hosdu-replicate-2: background meta-data entry self-heal triggered. path: /linux-3.0.2/arch/arm, reason: lookup detected pending operations [2012-01-31 10:08:28.034134] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 1-hosdu-replicate-2: path /linux-3.0.2/arch/arm/mach-ixp2000 on subvolume hosdu-client-5 => -1 (No such file or directory) [2012-01-31 10:08:28.035383] E [afr-self-heal-common.c:998:afr_sh_common_lookup_resp_handler] 1-hosdu-replicate-2: path /linux-3.0.2/arch/arm/mach-ixp23xx on subvolume hosdu-client-5 => -1 (No such file or directory) [2012-01-31 10:08:28.046615] I [afr-self-heal-common.c:2014:afr_self_heal_completion_cbk] 1-hosdu-replicate-2: background meta-data entry self-heal completed on /linux-3.0.2/arch/arm [2012-01-31 10:08:55.494328] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:08:55.496142] I [dht-layout.c:701:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-stripe-0; inode layout - 0 - 4294967295; disk layout - 0 - 2147483646 [2012-01-31 10:08:55.496170] I [dht-common.c:600:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-3.0.2 [2012-01-31 10:08:55.496235] I [dht-layout.c:701:dht_layout_dir_mismatch] 1-hosdu-dht: subvol: hosdu-stripe-1; inode layout - 0 - 0; disk layout - 2147483647 - 4294967295 [2012-01-31 10:08:55.496250] I [dht-common.c:600:dht_revalidate_cbk] 1-hosdu-dht: mismatching layouts for /linux-3.0.2 [2012-01-31 10:09:01.599031] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:09:26.036434] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:09:48.860556] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:09:51.396528] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:09:58.027263] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:12:12.951120] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:13:14.337162] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:13:17.244777] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:17:05.670415] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:20:45.825349] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:24:43.926470] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL I'm going to leave the set-up unchanged till someone takes a look at this.
profiling was enabled and stripe-block-size was set to 64MB. Output from volume info. [root@RHEL6 /]# gluster v i Volume Name: hosdu Type: Distributed-Striped-Replicate Volume ID: 34b3f3bd-53d0-4795-bcf0-e5fb81bddeea Status: Started Number of Bricks: 2 x 2 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.1.11.113:/data/brick/hosdu_brick1 Brick2: 10.1.11.114:/data/brick/hosdu_brick2 Brick3: 10.1.11.136:/data/brick/hosdu_brick3 Brick4: 10.1.11.137:/data/brick/hosdu_brick4 Brick5: 10.1.11.113:/data/brick/hosdu_brick5 Brick6: 10.1.11.114:/data/brick/hosdu_brick6 Brick7: 10.1.11.136:/data/brick/hosdu_brick7 Brick8: 10.1.11.137:/data/brick/hosdu_brick8 Options Reconfigured: cluster.stripe-block-size: 64MB diagnostics.count-fop-hits: on diagnostics.latency-measurement: on
last entries in the log file when stat is issued on 'hosa_dir' tail /usr/local/var/log/glusterfs/mnt-.log [2012-01-31 10:17:05.670415] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:20:45.825349] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-01-31 10:24:43.926470] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:08:22.649230] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:08:24.843136] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:08:32.447879] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:08:33.219337] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:09:31.144304] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:09:32.049503] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL [2012-02-01 04:10:30.911489] E [fuse-resolve.c:192:fuse_resolve_gfid] 0-glusterfs-fuse: xl is NULL
looks like the patch to implement proper resolve in fuse fixes this. (http://review.gluster.com/2703)
Can't verify the bug as it's verification is blocked by bug https://bugzilla.redhat.com/show_bug.cgi?id=820355 Because of the above bug I can't go beyond step 2 while verifying.
Followed the setps and issue is not happening. Moving to verified.