Description of problem: While running some distributed metadata tests I hit a case where an unlink and a rmdir were waiting on each other. Found with the help of d_metaverify and gfs2_hangalyzer. There are 2 glocks with waiters. dash-01, pid 21984 is waiting for glock 3/1018f, which is held by pid 13068 dash-02, pid 13068 is waiting for glock 2/10a008f, which is held by pid 21984 d_doio D ffffffff8852e138 0 21984 21983 21985 (NOTLB) ffff8100166a7c88 0000000000000086 0000000000000018 ffffffff88503488 0000000000000296 0000000000000009 ffff81003eb61100 ffffffff802e5ae0 0000a183cee091f4 00000000000009ed ffff81003eb612e8 0000000088504dea Call Trace: [<ffffffff88503488>] :dlm:request_lock+0x93/0xa0 [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe [<ffffffff800639be>] __wait_on_bit+0x40/0x6e [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009e107>] wake_bit_function+0x0/0x23 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8853aed7>] :gfs2:gfs2_unlink+0xd1/0x194 [<ffffffff8853ae6a>] :gfs2:gfs2_unlink+0x64/0x194 [<ffffffff8853ae88>] :gfs2:gfs2_unlink+0x82/0x194 [<ffffffff8853aeaa>] :gfs2:gfs2_unlink+0xa4/0x194 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 d_doio D ffffffff8852c138 0 13068 13064 13069 13067 (NOTLB) ffff8100207dfc58 0000000000000086 00000000fffffff5 ffff81000da75000 0000000000000296 0000000000000009 ffff81000b9ec7a0 ffffffff802e5ae0 00005db41430fb47 00000000000008ef ffff81000b9ec988 0000000088502dea Call Trace: [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe [<ffffffff8852c141>] :gfs2:just_schedule+0x9/0xe [<ffffffff800639be>] __wait_on_bit+0x40/0x6e [<ffffffff8852c138>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009e107>] wake_bit_function+0x0/0x23 [<ffffffff8852c133>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8852dada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb [<ffffffff8853903a>] :gfs2:gfs2_rmdir+0xa0/0x182 [<ffffffff88538ff1>] :gfs2:gfs2_rmdir+0x57/0x182 [<ffffffff88539009>] :gfs2:gfs2_rmdir+0x6f/0x182 [<ffffffff8853902d>] :gfs2:gfs2_rmdir+0x93/0x182 [<ffffffff8853e8a9>] :gfs2:gfs2_rindex_hold+0x32/0x152 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d [<ffffffff800df959>] do_rmdir+0x9c/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 d_metaverify -S 13325 -I 13325 -R /local/nstraz/svn/sts-rhel5/sts-root/var/share/resource_files/dash.xml -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w . Version-Release number of selected component (if applicable): kernel-2.6.18-104.el5 How reproducible: Unknown Steps to Reproduce: 1. Insert this case into a dd_io herd file: d_metaverify -S 13325 -I 13325 -R STS_RESOURCE_FILE -i 0 -s creat,unlink,rename,mkdir,rmdir -d 10 -n 50 -w . Actual results: Expected results: Additional info: dash-01 : dash:dashe0: G: s:UN n:3/1018f f:l t:EX d:EX/0 l:0 a:0 r:4 dash-01 : dash:dashe0: H: s:EX f:W e:0 p:21984 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2] dash-02 : dash:dashe0: G: s:EX n:3/1018f f:D t:EX d:UN/8293380000 l:0 a:0 r:7 dash-02 : dash:dashe0: H: s:EX f:H e:0 p:13068 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2] dash-02 : dash:dashe0: H: s:EX f:W e:0 p:13065 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2] dash-02 : dash:dashe0: H: s:EX f:W e:0 p:13067 [d_doio] gfs2_dinode_dealloc+0xf3/0x1a7 [gfs2] dash-02 : dash:dashe0: H: s:EX f:W e:0 p:13066 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2] dash-02 : dash:dashe0: R: n:65935 dash-03 : dash:dashe0: G: s:UN n:3/1018f f: t:UN d:EX/0 l:0 a:0 r:2 dash-01 : dash:dashe0: G: s:EX n:2/10a008f f:D t:EX d:UN/8292012000 l:0 a:0 r:8 dash-01 : dash:dashe0: H: s:EX f:H e:0 p:21984 [d_doio] gfs2_unlink+0x64/0x194 [gfs2] dash-01 : dash:dashe0: H: s:SH f:aW e:0 p:21987 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2] dash-01 : dash:dashe0: H: s:SH f:aW e:0 p:21986 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2] dash-01 : dash:dashe0: H: s:SH f:aW e:0 p:21985 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2] dash-01 : dash:dashe0: H: s:SH f:aW e:0 p:21997 [ls] gfs2_getattr+0x7d/0xc4 [gfs2] dash-01 : dash:dashe0: I: n:25/17432719 t:4 f:0x00000010 dash-02 : dash:dashe0: G: s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:6 dash-02 : dash:dashe0: H: s:EX f:W e:0 p:13068 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2] dash-02 : dash:dashe0: H: s:SH f:aW e:0 p:13069 [d_doio] gfs2_permission+0x7b/0xd5 [gfs2] dash-02 : dash:dashe0: H: s:SH f:aW e:0 p:13079 [ls] gfs2_getattr+0x7d/0xc4 [gfs2] dash-03 : dash:dashe0: G: s:UN n:2/10a008f f:l t:EX d:UN/0 l:0 a:0 r:5 dash-03 : dash:dashe0: H: s:EX f:W e:0 p:13002 [d_doio] gfs2_rename+0xfc/0x640 [gfs2] dash-03 : dash:dashe0: H: s:SH f:aW e:0 p:13010 [ls] gfs2_getattr+0x7d/0xc4 [gfs2]
I was able to reproduce this again in 40 minutes while running d_metaverify as stated in the original bug. Here are the locks and the backtraces for the processes: on dash-03: G: s:EX n:3/11 f:D t:EX d:UN/5638767000 l:0 a:0 r:8 H: s:EX f:H e:0 p:9049 [d_doio] gfs2_rmdir+0x93/0x182 [gfs2] R: n:17 G: s:UN n:2/880e98 f:l t:EX d:UN/0 l:0 a:0 r:4 H: s:EX f:W e:0 p:9049 [d_doio] gfs2_rmdir+0x57/0x182 [gfs2] d_doio D ffffffff8852e138 0 9049 9048 9050 (NOTLB) ffff8100325b7c58 0000000000000086 0000000000000000 ffff810031eda000 0000000000000296 0000000000000007 ffff81002ded1100 ffffffff802e5ae0 00000480c7093da0 0000000000000926 ffff81002ded12e8 0000000088504dea Call Trace: [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe [<ffffffff8852e141>] :gfs2:just_schedule+0x9/0xe [<ffffffff800639be>] __wait_on_bit+0x40/0x6e [<ffffffff8852e138>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009e107>] wake_bit_function+0x0/0x23 [<ffffffff8852e133>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8852fada>] :gfs2:gfs2_glock_nq_m+0xa9/0xeb [<ffffffff8853b03a>] :gfs2:gfs2_rmdir+0xa0/0x182 [<ffffffff8853aff1>] :gfs2:gfs2_rmdir+0x57/0x182 [<ffffffff8853b009>] :gfs2:gfs2_rmdir+0x6f/0x182 [<ffffffff8853b02d>] :gfs2:gfs2_rmdir+0x93/0x182 [<ffffffff885408a9>] :gfs2:gfs2_rindex_hold+0x32/0x152 [<ffffffff80049ec9>] vfs_rmdir+0xce/0x11d [<ffffffff800df959>] do_rmdir+0x9c/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 on dash-02: G: s:UN n:3/11 f:l t:EX d:EX/0 l:0 a:0 r:8 H: s:EX f:W e:0 p:9044 [d_doio] gfs2_unlink+0xa4/0x194 [gfs2] G: s:EX n:2/880e98 f:D t:EX d:UN/5614934000 l:0 a:0 r:4 H: s:EX f:H e:0 p:9044 [d_doio] gfs2_unlink+0x64/0x194 [gfs2] I: n:1061170/8916632 t:4 f:0x00000010 d_doio D ffffffff88527138 0 9044 9043 9045 (NOTLB) ffff81002a84bc88 0000000000000082 0000000000000018 ffffffff884fc488 0000000000000296 0000000000000007 ffff81002ee540c0 ffffffff802e5ae0 00000480d01f3655 0000000000000c01 ffff81002ee542a8 00000000884fddea Call Trace: [<ffffffff884fc488>] :dlm:request_lock+0x93/0xa0 [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe [<ffffffff88527141>] :gfs2:just_schedule+0x9/0xe [<ffffffff800639be>] __wait_on_bit+0x40/0x6e [<ffffffff88527138>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009e107>] wake_bit_function+0x0/0x23 [<ffffffff88527133>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff88533ed7>] :gfs2:gfs2_unlink+0xd1/0x194 [<ffffffff88533e6a>] :gfs2:gfs2_unlink+0x64/0x194 [<ffffffff88533e88>] :gfs2:gfs2_unlink+0x82/0x194 [<ffffffff88533eaa>] :gfs2:gfs2_unlink+0xa4/0x194 [<ffffffff885398a9>] :gfs2:gfs2_rindex_hold+0x32/0x152 [<ffffffff80049db5>] vfs_unlink+0xc2/0x108 [<ffffffff8003c4c1>] do_unlinkat+0xaa/0x141 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
This looks like a dup to me *** This bug has been marked as a duplicate of bug 458289 ***