Bug 1209340 - Random regression test hang : bug-1113960.t
Summary: Random regression test hang : bug-1113960.t
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-07 06:08 UTC by Nithya Balachandran
Modified: 2018-08-29 03:34 UTC (History)
3 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 03:34:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2015-04-07 06:08:13 UTC
Description of problem:

The bug-1113960.t test sometimes hangs on the upstream regression test systems.

Version-Release number of selected component (if applicable):


How reproducible:
Random. Running bug-1113960.t can sometimes cause the mount point to hang.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
No hang

Additional info:

Comment 1 Nithya Balachandran 2015-04-07 06:31:36 UTC
The test renames multiple directories in a deep directory structure from one FUSE mount while simultaneous lookup requests are sent from another FUSE mount.

Initial analysis on a hung system using gdb and logs reveals the following:


1. The hung process is constantly calling __foreach_ancestor_dentry()
2. The directory rename has failed some directories on one subvolume
3. This seems to lead to a situation where a single directory has 2 dentries - one each for the old and new names.
4. The inode_link() function does a cycle check by calling __foreach_ancestor_dentry()
4. __foreach_ancestor_dentry() walks up the directory tree for each dentry it finds in the parent dentry list. This means that there will be 2^(level at which duplicate dentry exists) calls for each cycle check. 

This causes inode_link to hang as it takes a very long time to finish the cycle check. 

In the process, olddir12 rename failed on patchy-client-3
From the logs:

24363 ?        S      0:00 mv /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/file0 /mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12/longernamedir13/longernamedir14/longernamedir15/longernamedir16/longernamedir17/longernamedir18/longernamedir19/longernamedir20/longernamedir21/longernamedir22/longernamedir23/longernamedir24/longernamedir25/longernamedir26/longernamedir27/longernamedir28/longernamedir29/longernamedir30/longernamedir31/longernamedir32/longernamedir33/longernamedir34/longernamedir35/longernamedir36/longernamedir37/longernamedir38/longernamedir39/longernamedir40/longernamedir41/longernamedir42/longernamedir43/longernamedir44/longernamedir45/newfile0



[root@bulk5 ~]# /build/install/sbin/gluster v info
 
Volume Name: patchy
Type: Distribute
Volume ID: 213b4118-a3e6-4732-9d59-487b47cfda94
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: bulk5.rack.gluster.org:/d/backends/patchy1
Brick2: bulk5.rack.gluster.org:/d/backends/patchy2
Brick3: bulk5.rack.gluster.org:/d/backends/patchy3
Brick4: bulk5.rack.gluster.org:/d/backends/patchy4
[root@bulk5 ~]# 





[2015-03-30 20:20:54.753847] I [MSGID: 109036] [dht-common.c:6407:dht_log_new_layout_for_dir_selfheal] 0-patchy-dht: Setting layout of /olddir11 with [Subvol_name: patchy-client-0, Err: -1 , Start: 1073733380 , Stop: 2147466759 ], [Subvol_name: patchy-client-1, Err: -1 , Start: 2147466760 , Stop: 3221200139 ], [Subvol_name: patchy-client-2, Err: -1 , Start: 3221200140 , Stop: 4294967295 ], [Subvol_name: patchy-client-3, Err: -1 , Start: 0 , Stop: 1073733379 ],




[2015-03-30 20:21:01.442079] I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000
[2015-03-30 20:21:01.442117] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-1; inode layout - 3221200140 - 4294967295; disk layout - 2863302660 - 4294967295
[2015-03-30 20:21:01.442288] I [dht-layout.c:800:dht_layout_dir_mismatch] 0-patchy-dht: subvol: patchy-client-2; inode layout - 0 - 1073733379; disk layout - 0 - 1431651329
[2015-03-30 20:21:01.456272] I [dht-rename.c:1344:dht_rename] 0-patchy-dht: renaming /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 (hash=patchy-client-0/cache=patchy-client-3) => /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 (hash=patchy-client-2/cache=<nul>)
[2015-03-30 20:21:01.458541] W [client-rpc-fops.c:2599:client3_3_rename_cbk] 0-patchy-client-3: remote operation failed: No such file or directory
The message "I [MSGID: 109018] [dht-common.c:772:dht_revalidate_cbk] 0-patchy-dht: Mismatching layouts for /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12, gfid = 00000000-0000-0000-0000-000000000000" repeated 2 times between [2015-03-30 20:21:01.442079] and [2015-03-30 20:21:01.442305]
[2015-03-30 20:21:01.458567] I [MSGID: 109030] [dht-rename.c:49:dht_rename_dir_cbk] 0-patchy-dht: Rename /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 on patchy-client-3 failed, (gfid = e2b8a2b5-e60a-4bb2-9593-6417982e3ca9) [No such file or directory]
[2015-03-30 20:21:01.458637] W [fuse-bridge.c:1756:fuse_rename_cbk] 0-glusterfs-fuse: 16792: /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/olddir12 -> /longernamedir1/longernamedir2/longernamedir3/longernamedir4/longernamedir5/longernamedir6/longernamedir7/longernamedir8/longernamedir9/longernamedir10/longernamedir11/longernamedir12 => -1 (No such file or directory)





I need to try to reproduce this on local systems.

Comment 2 Vijay Bellur 2016-05-17 09:22:07 UTC
REVIEW: http://review.gluster.org/14373 (libglusterfs: Skip ancestory check for same parent dentries) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 3 Amar Tumballi 2018-08-29 03:34:59 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.