Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 459843

Summary:	GFS2: deadlock between unlink and rmdir
Product:	Red Hat Enterprise Linux 5	Reporter:	Nate Straz <nstraz>
Component:	kernel	Assignee:	Steve Whitehouse <swhiteho>
Status:	CLOSED DUPLICATE	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.3	CC:	cluster-maint, edamato
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-08-26 07:48:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nate Straz 2008-08-22 20:50:30 UTC

Description of problem:

While running some distributed metadata tests I hit a case where an unlink and a rmdir were waiting on each other.  Found with the help of d_metaverify and gfs2_hangalyzer.


There are 2 glocks with waiters.
dash-01, pid 21984 is waiting for glock 3/1018f, which is held by pid 13068
dash-02, pid 13068 is waiting for glock 2/10a008f, which is held by pid 21984

d_doio        D ffffffff8852e138     0 21984  21983         21985       (NOTLB)
 ffff8100166a7c88 0000000000000086 0000000000000018 ffffffff88503488
 0000000000000296 0000000000000009 ffff81003eb61100 ffffffff802e5ae0
 0000a183cee091f4 00000000000009ed ffff81003eb612e8 0000000088504dea
Call Trace:
 [<ffffffff88503488>] :dlm:request_lock+0x93/0xa0
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8853aed7>] :gfs2:gfs2_unlink+0xd1/0x194
 [<ffffffff8853ae6a>] :gfs2:gfs2_unlink+0x64/0x194
 [<ffffffff8853ae88>] :gfs2:gfs2_unlink+0x82/0x194
 [<ffffffff8853aeaa>] :gfs2:gfs2_unlink+0xa4/0x194
 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108
 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

d_doio        D ffffffff8852c138     0 13068  13064         13069 13067 (NOTLB)
 ffff8100207dfc58 0000000000000086 00000000fffffff5 ffff81000da75000
 0000000000000296 0000000000000009 ffff81000b9ec7a0 ffffffff802e5ae0
 00005db41430fb47 00000000000008ef ffff81000b9ec988 0000000088502dea
Call Trace:
 [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852c141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852c133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8852dada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb
 [<ffffffff8853903a>] :gfs2:gfs2_rmdir+0xa0/0x182
 [<ffffffff88538ff1>] :gfs2:gfs2_rmdir+0x57/0x182
 [<ffffffff88539009>] :gfs2:gfs2_rmdir+0x6f/0x182
 [<ffffffff8853902d>] :gfs2:gfs2_rmdir+0x93/0x182
 [<ffffffff8853e8a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d
 [<ffffffff800df959>] do_rmdir+0x9c/0xde
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0



d_metaverify -S 13325  -I 13325 -R /local/nstraz/svn/sts-rhel5/sts-root/var/share/resource_files/dash.xml -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w .


Version-Release number of selected component (if applicable):
kernel-2.6.18-104.el5

How reproducible:
Unknown

Steps to Reproduce:
1. Insert this case into a dd_io herd file:

d_metaverify -S 13325  -I 13325 -R STS_RESOURCE_FILE -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w .
  
Actual results:


Expected results:


Additional info:

dash-01   : dash:dashe0: G:  s:UN n:3/1018f f:l t:EX d:EX/0 l:0 a:0 r:4
dash-01   : dash:dashe0:  H: s:EX f:W e:0 p:21984 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0: G:  s:EX n:3/1018f f:D t:EX d:UN/8293380000 l:0 a:0 r:7
dash-02   : dash:dashe0:  H: s:EX f:H e:0 p:13068 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13065 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13067 [d_doio] gfs2_dinode_dealloc+0xf3/0x1a7 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13066 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0:  R: n:65935
dash-03   : dash:dashe0: G:  s:UN n:3/1018f f: t:UN d:EX/0 l:0 a:0 r:2



dash-01   : dash:dashe0: G:  s:EX n:2/10a008f f:D t:EX d:UN/8292012000 l:0 a:0 r:8
dash-01   : dash:dashe0:  H: s:EX f:H e:0 p:21984 [d_doio] gfs2_unlink+0x64/0x194 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21987 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21986 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21985 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21997 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]
dash-01   : dash:dashe0:  I: n:25/17432719 t:4 f:0x00000010
dash-02   : dash:dashe0: G:  s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:6
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13068 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2]
dash-02   : dash:dashe0:  H: s:SH f:aW e:0 p:13069 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-02   : dash:dashe0:  H: s:SH f:aW e:0 p:13079 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]
dash-03   : dash:dashe0: G:  s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:5
dash-03   : dash:dashe0:  H: s:EX f:W e:0 p:13002 [d_doio] gfs2_rename+0xfc/0x640 [gfs2]
dash-03   : dash:dashe0:  H: s:SH f:aW e:0 p:13010 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]

Comment 1 Nate Straz 2008-08-25 22:04:55 UTC

I was able to reproduce this again in 40 minutes while running d_metaverify as stated in the original bug.  Here are the locks and the backtraces for the processes:

on dash-03:
G:  s:EX n:3/11 f:D t:EX d:UN/5638767000 l:0 a:0 r:8
 H: s:EX f:H e:0 p:9049 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2]
 R: n:17
G:  s:UN n:2/880e98 f:l t:EX d:UN/0 l:0 a:0 r:4
 H: s:EX f:W e:0 p:9049 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2]

d_doio        D ffffffff8852e138     0  9049   9048          9050       (NOTLB)
 ffff8100325b7c58 0000000000000086 0000000000000000 ffff810031eda000
 0000000000000296 0000000000000007 ffff81002ded1100 ffffffff802e5ae0
 00000480c7093da0 0000000000000926 ffff81002ded12e8 0000000088504dea
Call Trace:
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8852fada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb
 [<ffffffff8853b03a>] :gfs2:gfs2_rmdir+0xa0/0x182
 [<ffffffff8853aff1>] :gfs2:gfs2_rmdir+0x57/0x182
 [<ffffffff8853b009>] :gfs2:gfs2_rmdir+0x6f/0x182
 [<ffffffff8853b02d>] :gfs2:gfs2_rmdir+0x93/0x182
 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d
 [<ffffffff800df959>] do_rmdir+0x9c/0xde
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

on dash-02:
G:  s:UN n:3/11 f:l t:EX d:EX/0 l:0 a:0 r:8
 H: s:EX f:W e:0 p:9044 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
G:  s:EX n:2/880e98 f:D t:EX d:UN/5614934000 l:0 a:0 r:4
 H: s:EX f:H e:0 p:9044 [d_doio] gfs2_unlink+0x64/0x194 [gfs2]
 I: n:1061170/8916632 t:4 f:0x00000010

d_doio        D ffffffff88527138     0  9044   9043          9045       (NOTLB)
 ffff81002a84bc88 0000000000000082 0000000000000018 ffffffff884fc488
 0000000000000296 0000000000000007 ffff81002ee540c0 ffffffff802e5ae0
 00000480d01f3655 0000000000000c01 ffff81002ee542a8 00000000884fddea
Call Trace:
 [<ffffffff884fc488>] :dlm:request_lock+0x93/0xa0
 [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff88527141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff88527133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff88533ed7>] :gfs2:gfs2_unlink+0xd1/0x194
 [<ffffffff88533e6a>] :gfs2:gfs2_unlink+0x64/0x194
 [<ffffffff88533e88>] :gfs2:gfs2_unlink+0x82/0x194
 [<ffffffff88533eaa>] :gfs2:gfs2_unlink+0xa4/0x194
 [<ffffffff885398a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108
 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Comment 2 Steve Whitehouse 2008-08-26 07:48:26 UTC

This looks like a dup to me

*** This bug has been marked as a duplicate of bug 458289 ***