Bug 1209340

Summary: Random regression test hang : bug-1113960.t
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: coreAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: bugs, sankarshan, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.1.3 (or later) Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 03:34:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nithya Balachandran 2015-04-07 06:08:13 UTC
Description of problem:

The bug-1113960.t test sometimes hangs on the upstream regression test systems.

Version-Release number of selected component (if applicable):


How reproducible:
Random. Running bug-1113960.t can sometimes cause the mount point to hang.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
No hang

Additional info:

Comment 1 Nithya Balachandran 2015-04-07 06:31:36 UTC
The test renames multiple directories in a deep directory structure from one FUSE mount while simultaneous lookup requests are sent from another FUSE mount.

Initial analysis on a hung system using gdb and logs reveals the following:


1. The hung process is constantly calling __foreach_ancestor_dentry()
2. The directory rename has failed some directories on one subvolume
3. This seems to lead to a situation where a single directory has 2 dentries - one each for the old and new names.
4. The inode_link() function does a cycle check by calling __foreach_ancestor_dentry()
4. __foreach_ancestor_dentry() walks up the directory tree for each dentry it finds in the parent dentry list. This means that there will be 2^(level at which duplicate dentry exists) calls for each cycle check. 

This causes inode_link to hang as it takes a very long time to finish the cycle check. 

In the process, olddir12 rename failed on patchy-client-3
From the logs:

24363 ?        S      0:00 mv /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/file0 /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/newfile0



[root@bulk5 ~]# /build/install/sbin/gluster v info
 
Volume Name: patchy
Type: Distribute
Volume ID: 213b4118-a3e6-4732-9d59-487b47cfda94
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: bulk5.rack.gluster.org:/d/backends/patchy1
Brick2: bulk5.rack.gluster.org:/d/backends/patchy2
Brick3: bulk5.rack.gluster.org:/d/backends/patchy3
Brick4: bulk5.rack.gluster.org:/d/backends/patchy4
[root@bulk5 ~]# 





[2015-03-30 20:20:54.753847] I [MSGID: 109036] [dht-common.c:6407:dht_log_new_layout_for_dir_selfheal] 0-patchy-dht: Setting layout of /olddir11 with [Subvol_name: patchy-client-0, Err: -1 , Start: 1073733380 , Stop: 2147466759 ], [Subvol_name: patchy-client-1, Err: -1 , Start: 2147466760 , Stop: 3221200139 ], [Subvol_name: patchy-client-2, Err: -1 , Start: 3221200140 , Stop: 4294967295 ], [Subvol_name: patchy-client-3, Err: -1 , Start: 0 , Stop: 1073733379 ],




[2015-03-30 20:21:01.442079] I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000
[2015-03-30 20:21:01.442117] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-1; inode layout - 3221200140 - 4294967295; disk layout - 2863302660 - 4294967295
[2015-03-30 20:21:01.442288] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-2; inode layout - 0 - 1073733379; disk layout - 0 - 1431651329
[2015-03-30 20:21:01.456272] I [dht-rename.c:1344:dht_rename] 0-patchy-dht: renaming /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 (hash=patchy-client-0/cache=patchy-client-3) => /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 (hash=patchy-client-2/cache=<nul>)
[2015-03-30 20:21:01.458541] W [client-rpc-fops.c:2599:client3_3_rename_cbk] 0-patchy-client-3: remote operation failed: No such file or directory
The message "I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000" repeated 2 times between [2015-03-30 20:21:01.442079] and [2015-03-30 20:21:01.442305]
[2015-03-30 20:21:01.458567] I [MSGID: 109030] [dht-rename.c:49:dht_rename_dir_cbk] 0-patchy-dht: Rename /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 on patchy-client-3 failed, (gfid = e2b8a2b5-e60a-4bb2-9593-6417982e3ca9) [No such file or directory]
[2015-03-30 20:21:01.458637] W [fuse-bridge.c:1756:fuse_rename_cbk] 0-glusterfs-fuse: 16792: /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 => -1 (No such file or directory)





I need to try to reproduce this on local systems.

Comment 2 Vijay Bellur 2016-05-17 09:22:07 UTC
REVIEW: http://review.gluster.org/14373 (libglusterfs: Skip ancestory check for same parent dentries) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 3 Amar Tumballi 2018-08-29 03:34:59 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.