Description of problem: The bug-1113960.t test sometimes hangs on the upstream regression test systems. Version-Release number of selected component (if applicable): How reproducible: Random. Running bug-1113960.t can sometimes cause the mount point to hang. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: No hang Additional info:
The test renames multiple directories in a deep directory structure from one FUSE mount while simultaneous lookup requests are sent from another FUSE mount. Initial analysis on a hung system using gdb and logs reveals the following: 1. The hung process is constantly calling __foreach_ancestor_dentry() 2. The directory rename has failed some directories on one subvolume 3. This seems to lead to a situation where a single directory has 2 dentries - one each for the old and new names. 4. The inode_link() function does a cycle check by calling __foreach_ancestor_dentry() 4. __foreach_ancestor_dentry() walks up the directory tree for each dentry it finds in the parent dentry list. This means that there will be 2^(level at which duplicate dentry exists) calls for each cycle check. This causes inode_link to hang as it takes a very long time to finish the cycle check. In the process, olddir12 rename failed on patchy-client-3 From the logs: 24363 ? S 0:00 mv /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/file0 /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/newfile0 [root@bulk5 ~]# /build/install/sbin/gluster v info Volume Name: patchy Type: Distribute Volume ID: 213b4118-a3e6-4732-9d59-487b47cfda94 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: bulk5.rack.gluster.org:/d/backends/patchy1 Brick2: bulk5.rack.gluster.org:/d/backends/patchy2 Brick3: bulk5.rack.gluster.org:/d/backends/patchy3 Brick4: bulk5.rack.gluster.org:/d/backends/patchy4 [root@bulk5 ~]# [2015-03-30 20:20:54.753847] I [MSGID: 109036] [dht-common.c:6407:dht_log_new_layout_for_dir_selfheal] 0-patchy-dht: Setting layout of /olddir11 with [Subvol_name: patchy-client-0, Err: -1 , Start: 1073733380 , Stop: 2147466759 ], [Subvol_name: patchy-client-1, Err: -1 , Start: 2147466760 , Stop: 3221200139 ], [Subvol_name: patchy-client-2, Err: -1 , Start: 3221200140 , Stop: 4294967295 ], [Subvol_name: patchy-client-3, Err: -1 , Start: 0 , Stop: 1073733379 ], [2015-03-30 20:21:01.442079] I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000 [2015-03-30 20:21:01.442117] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-1; inode layout - 3221200140 - 4294967295; disk layout - 2863302660 - 4294967295 [2015-03-30 20:21:01.442288] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-2; inode layout - 0 - 1073733379; disk layout - 0 - 1431651329 [2015-03-30 20:21:01.456272] I [dht-rename.c:1344:dht_rename] 0-patchy-dht: renaming /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 (hash=patchy-client-0/cache=patchy-client-3) => /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 (hash=patchy-client-2/cache=<nul>) [2015-03-30 20:21:01.458541] W [client-rpc-fops.c:2599:client3_3_rename_cbk] 0-patchy-client-3: remote operation failed: No such file or directory The message "I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000" repeated 2 times between [2015-03-30 20:21:01.442079] and [2015-03-30 20:21:01.442305] [2015-03-30 20:21:01.458567] I [MSGID: 109030] [dht-rename.c:49:dht_rename_dir_cbk] 0-patchy-dht: Rename /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 on patchy-client-3 failed, (gfid = e2b8a2b5-e60a-4bb2-9593-6417982e3ca9) [No such file or directory] [2015-03-30 20:21:01.458637] W [fuse-bridge.c:1756:fuse_rename_cbk] 0-glusterfs-fuse: 16792: /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 => -1 (No such file or directory) I need to try to reproduce this on local systems.
REVIEW: http://review.gluster.org/14373 (libglusterfs: Skip ancestory check for same parent dentries) posted (#1) for review on master by N Balachandran (nbalacha)
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.