Bug 459843 - GFS2: deadlock between unlink and rmdir
Summary: GFS2: deadlock between unlink and rmdir
Keywords:
Status: CLOSED DUPLICATE of bug 458289
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Steve Whitehouse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-08-22 20:50 UTC by Nate Straz
Modified: 2009-05-28 03:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-26 07:48:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nate Straz 2008-08-22 20:50:30 UTC
Description of problem:

While running some distributed metadata tests I hit a case where an unlink and a rmdir were waiting on each other.  Found with the help of d_metaverify and gfs2_hangalyzer.


There are 2 glocks with waiters.
dash-01, pid 21984 is waiting for glock 3/1018f, which is held by pid 13068
dash-02, pid 13068 is waiting for glock 2/10a008f, which is held by pid 21984

d_doio        D ffffffff8852e138     0 21984  21983         21985       (NOTLB)
 ffff8100166a7c88 0000000000000086 0000000000000018 ffffffff88503488
 0000000000000296 0000000000000009 ffff81003eb61100 ffffffff802e5ae0
 0000a183cee091f4 00000000000009ed ffff81003eb612e8 0000000088504dea
Call Trace:
 [<ffffffff88503488>] :dlm:request_lock+0x93/0xa0
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8853aed7>] :gfs2:gfs2_unlink+0xd1/0x194
 [<ffffffff8853ae6a>] :gfs2:gfs2_unlink+0x64/0x194
 [<ffffffff8853ae88>] :gfs2:gfs2_unlink+0x82/0x194
 [<ffffffff8853aeaa>] :gfs2:gfs2_unlink+0xa4/0x194
 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108
 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

d_doio        D ffffffff8852c138     0 13068  13064         13069 13067 (NOTLB)
 ffff8100207dfc58 0000000000000086 00000000fffffff5 ffff81000da75000
 0000000000000296 0000000000000009 ffff81000b9ec7a0 ffffffff802e5ae0
 00005db41430fb47 00000000000008ef ffff81000b9ec988 0000000088502dea
Call Trace:
 [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852c141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852c133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8852dada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb
 [<ffffffff8853903a>] :gfs2:gfs2_rmdir+0xa0/0x182
 [<ffffffff88538ff1>] :gfs2:gfs2_rmdir+0x57/0x182
 [<ffffffff88539009>] :gfs2:gfs2_rmdir+0x6f/0x182
 [<ffffffff8853902d>] :gfs2:gfs2_rmdir+0x93/0x182
 [<ffffffff8853e8a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d
 [<ffffffff800df959>] do_rmdir+0x9c/0xde
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0



d_metaverify -S 13325  -I 13325 -R /local/nstraz/svn/sts-rhel5/sts-root/var/share/resource_files/dash.xml -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w .


Version-Release number of selected component (if applicable):
kernel-2.6.18-104.el5

How reproducible:
Unknown

Steps to Reproduce:
1. Insert this case into a dd_io herd file:

d_metaverify -S 13325  -I 13325 -R STS_RESOURCE_FILE -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w .
  
Actual results:


Expected results:


Additional info:

dash-01   : dash:dashe0: G:  s:UN n:3/1018f f:l t:EX d:EX/0 l:0 a:0 r:4
dash-01   : dash:dashe0:  H: s:EX f:W e:0 p:21984 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0: G:  s:EX n:3/1018f f:D t:EX d:UN/8293380000 l:0 a:0 r:7
dash-02   : dash:dashe0:  H: s:EX f:H e:0 p:13068 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13065 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13067 [d_doio] gfs2_dinode_dealloc+0xf3/0x1a7 [gfs2]
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13066 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
dash-02   : dash:dashe0:  R: n:65935
dash-03   : dash:dashe0: G:  s:UN n:3/1018f f: t:UN d:EX/0 l:0 a:0 r:2



dash-01   : dash:dashe0: G:  s:EX n:2/10a008f f:D t:EX d:UN/8292012000 l:0 a:0 r:8
dash-01   : dash:dashe0:  H: s:EX f:H e:0 p:21984 [d_doio] gfs2_unlink+0x64/0x194 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21987 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21986 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21985 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-01   : dash:dashe0:  H: s:SH f:aW e:0 p:21997 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]
dash-01   : dash:dashe0:  I: n:25/17432719 t:4 f:0x00000010
dash-02   : dash:dashe0: G:  s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:6
dash-02   : dash:dashe0:  H: s:EX f:W e:0 p:13068 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2]
dash-02   : dash:dashe0:  H: s:SH f:aW e:0 p:13069 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2]
dash-02   : dash:dashe0:  H: s:SH f:aW e:0 p:13079 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]
dash-03   : dash:dashe0: G:  s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:5
dash-03   : dash:dashe0:  H: s:EX f:W e:0 p:13002 [d_doio] gfs2_rename+0xfc/0x640 [gfs2]
dash-03   : dash:dashe0:  H: s:SH f:aW e:0 p:13010 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]

Comment 1 Nate Straz 2008-08-25 22:04:55 UTC
I was able to reproduce this again in 40 minutes while running d_metaverify as stated in the original bug.  Here are the locks and the backtraces for the processes:

on dash-03:
G:  s:EX n:3/11 f:D t:EX d:UN/5638767000 l:0 a:0 r:8
 H: s:EX f:H e:0 p:9049 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2]
 R: n:17
G:  s:UN n:2/880e98 f:l t:EX d:UN/0 l:0 a:0 r:4
 H: s:EX f:W e:0 p:9049 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2]

d_doio        D ffffffff8852e138     0  9049   9048          9050       (NOTLB)
 ffff8100325b7c58 0000000000000086 0000000000000000 ffff810031eda000
 0000000000000296 0000000000000007 ffff81002ded1100 ffffffff802e5ae0
 00000480c7093da0 0000000000000926 ffff81002ded12e8 0000000088504dea
Call Trace:
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff8852fada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb
 [<ffffffff8853b03a>] :gfs2:gfs2_rmdir+0xa0/0x182
 [<ffffffff8853aff1>] :gfs2:gfs2_rmdir+0x57/0x182
 [<ffffffff8853b009>] :gfs2:gfs2_rmdir+0x6f/0x182
 [<ffffffff8853b02d>] :gfs2:gfs2_rmdir+0x93/0x182
 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d
 [<ffffffff800df959>] do_rmdir+0x9c/0xde
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

on dash-02:
G:  s:UN n:3/11 f:l t:EX d:EX/0 l:0 a:0 r:8
 H: s:EX f:W e:0 p:9044 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2]
G:  s:EX n:2/880e98 f:D t:EX d:UN/5614934000 l:0 a:0 r:4
 H: s:EX f:H e:0 p:9044 [d_doio] gfs2_unlink+0x64/0x194 [gfs2]
 I: n:1061170/8916632 t:4 f:0x00000010

d_doio        D ffffffff88527138     0  9044   9043          9045       (NOTLB)
 ffff81002a84bc88 0000000000000082 0000000000000018 ffffffff884fc488
 0000000000000296 0000000000000007 ffff81002ee540c0 ffffffff802e5ae0
 00000480d01f3655 0000000000000c01 ffff81002ee542a8 00000000884fddea
Call Trace:
 [<ffffffff884fc488>] :dlm:request_lock+0x93/0xa0
 [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff88527141>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff800639be>] __wait_on_bit+0x40/0x6e
 [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009e107>] wake_bit_function+0x0/0x23
 [<ffffffff88527133>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff88533ed7>] :gfs2:gfs2_unlink+0xd1/0x194
 [<ffffffff88533e6a>] :gfs2:gfs2_unlink+0x64/0x194
 [<ffffffff88533e88>] :gfs2:gfs2_unlink+0x82/0x194
 [<ffffffff88533eaa>] :gfs2:gfs2_unlink+0xa4/0x194
 [<ffffffff885398a9>] :gfs2:gfs2_rindex_hold+0x32/0x152
 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108
 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Comment 2 Steve Whitehouse 2008-08-26 07:48:26 UTC
This looks like a dup to me

*** This bug has been marked as a duplicate of bug 458289 ***


Note You need to log in before you can comment on or make changes to this bug.