Description of problem: Started up a metadata test on a four node cluster. One node ftruncated a file and hit the following stack: /tmp/foo TCP for communications dlm: connecting to 1 dlm: got connection from 2 dlm: got connection from 1 dlm: connecting to 4 SCSI device sdb: 571400192 512-byte hdwr sectors (292557 MB) sdb: Write Protect is off SCSI device sdb: drive cache: write back SCSI device sdc: 571400192 512-byte hdwr sectors (292557 MB) sdc: Write Protect is off SCSI device sdc: drive cache: write back dlm: Using TCP for communications dlm: connecting to 1 dlm: got connection from 2 dlm: got connection from 1 dlm: connecting to 4 Module sctp cannot be unloaded due to unsafe usage in net/sctp/protocol.c:1218 GFS2: fsid=link_ia64:link_ia640.2: fatal: assertion "test_bit(HIF_HOLDER, &gh->gh_iflags)" failed GFS2: fsid=link_ia64:link_ia640.2: function = glock_wait_internal, file = fs/gfs2/glock.c, line = 1036 GFS2: fsid=link_ia64:link_ia640.2: about to withdraw this file system kernel BUG at fs/gfs2/lm.c:107! d_doio[5778]: bugcheck! 0 [1] Modules linked in: sctp gnbd(U) lock_nolock gfs(U) lock_dlm gfs2 dlm configfs ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport sg lpfc scsi_transport_fc e100 tg3 mii ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 5778, CPU 0, comm: d_doio psr : 00001010085a6010 ifs : 800000000000060f ip : [<a000000201023d20>] Tainted: G ip is at gfs2_lm_withdraw+0x160/0x220 [gfs2] unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003 rnat: 0000000000000000 bsps: a0000001009f8490 pr : 0000000000556565 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000201023d20 b6 : a000000100037880 b7 : a00000010000b820 f6 : 1003e0000009e7ef5eccc f7 : 1003e0000000000000384 f8 : 1003e0000009e7ef5e948 f9 : 1003e0000000000000001 f10 : 0fffbccccccccc8c00000 f11 : 1003e0000000000000000 r1 : a000000100be0270 r2 : a0000001009f80e0 r3 : a0000001009e1530 r8 : 0000000000000023 r9 : a0000001009f8110 r10 : a0000001009f8110 r11 : 0000000000000000 r12 : e000000027c07cd0 r13 : e000000027c00000 r14 : a0000001009f80e0 r15 : 0000000000000000 r16 : 0000000000000001 r17 : c0000000ff5e0001 r18 : 000000000000000d r19 : c0000000ff5e0000 r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80e8 r23 : a0000001009f80e8 r24 : a000000100928fe0 r25 : a000000100928fe0 r26 : a0000001009e0a10 r27 : 0000000000000000 r28 : 0000000000000034 r29 : 0000000000000034 r30 : 0000000000000000 r31 : a0000001009f846c Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e000000027c07860 bsp=e000000027c015d0 [<a0000001000143e0>] show_regs+0x840/0x880 sp=e000000027c07a30 bsp=e000000027c01578 [<a000000100037bc0>] die+0x1c0/0x2c0 sp=e000000027c07a30 bsp=e000000027c01530 [<a000000100037d10>] die_if_kernel+0x50/0x80 sp=e000000027c07a50 bsp=e000000027c01500 [<a000000100633350>] ia64_bad_break+0x270/0x4a0 sp=e000000027c07a50 bsp=e000000027c014d8 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280 sp=e000000027c07b00 bsp=e000000027c014d8 [<a000000201023d20>] gfs2_lm_withdraw+0x160/0x220 [gfs2] sp=e000000027c07cd0 bsp=e000000027c01460 [<a0000002010546d0>] gfs2_assert_withdraw_i+0x50/0xa0 [gfs2] sp=e000000027c07d10 bsp=e000000027c01418 [<a00000020101acb0>] glock_wait_internal+0x3d0/0x700 [gfs2] sp=e000000027c07d10 bsp=e000000027c013c8 [<a00000020101b580>] gfs2_glock_nq+0x5a0/0x640 [gfs2] sp=e000000027c07d10 bsp=e000000027c01378 [<a00000020104ad70>] gfs2_inplace_reserve_i+0x570/0xf60 [gfs2] sp=e000000027c07d10 bsp=e000000027c012d0 [<a000000201004580>] gfs2_truncatei+0x200/0x1220 [gfs2] sp=e000000027c07d70 bsp=e000000027c01260 [<a00000020103a1e0>] gfs2_setattr+0x220/0x7e0 [gfs2] sp=e000000027c07d80 bsp=e000000027c011f0 [<a0000001001a6310>] notify_change+0x350/0x7c0 sp=e000000027c07dc0 bsp=e000000027c01198 [<a0000001001603e0>] do_truncate+0xa0/0xe0 sp=e000000027c07de0 bsp=e000000027c01150 [<a0000001001614e0>] sys_ftruncate+0x220/0x280 sp=e000000027c07e30 bsp=e000000027c010d8 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110 sp=e000000027c07e30 bsp=e000000027c010d8 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e000000027c08000 bsp=e000000027c010d8 Version-Release number of selected component (if applicable): 2.6.18-79.el5 How reproducible: Unknown -- will re-run Steps to Reproduce: 1. d_metaverify -i 120s -I 1234 -w /mnt/link_ia640 #on driver 2. d_doio -I 1234 -P fore # on each cluster node
As I was reporting the above, another node attempted a create and also died a horrible death: GFS2: fsid=link_ia64:link_ia640.0: fatal: assertion "test_bit(HIF_HOLDER, &gh->gh_iflags)" failed GFS2: fsid=link_ia64:link_ia640.0: function = glock_wait_internal, file = fs/gfs2/glock.c, line = 1036 GFS2: fsid=link_ia64:link_ia640.0: about to withdraw this file system kernel BUG at fs/gfs2/lm.c:107! d_doio[6063]: bugcheck! 0 [1] Modules linked in: sctp gnbd(U) lock_nolock gfs(U) lock_dlm gfs2 dlm configfs ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport lpfc sg scsi_transport_fc e100 tg3 mii ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 6063, CPU 0, comm: d_doio psr : 00001010085a6010 ifs : 800000000000060f ip : [<a000000201003d20>] Tainted: G ip is at gfs2_lm_withdraw+0x160/0x220 [gfs2] unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003 rnat: a000000100abae68 bsps: 0000000000000004 pr : 000000000055a559 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000201003d20 b6 : a0000001000110a0 b7 : a00000010000b820 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100be0270 r2 : a0000001009f80e0 r3 : a000000100928fe0 r8 : 0000000000000023 r9 : a0000001009f8110 r10 : a0000001009f8110 r11 : 0000000000000000 r12 : e00000001d8bfbf0 r13 : e00000001d8b8000 r14 : a0000001009f80e0 r15 : 0000000000000000 r16 : a000000100928fe8 r17 : e00000002168fe18 r18 : 0000000000000000 r19 : 0000000000000001 r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80e8 r23 : a0000001009f80e8 r24 : e00000003f889054 r25 : 0000000000000000 r26 : e00000003f88905c r27 : e00000003f889040 r28 : e00000003f888008 r29 : e000000021688060 r30 : e00000003f88802c r31 : e00000002168802c Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e00000001d8bf780 bsp=e00000001d8b9758 [<a0000001000143e0>] show_regs+0x840/0x880 sp=e00000001d8bf950 bsp=e00000001d8b9700 [<a000000100037bc0>] die+0x1c0/0x2c0 sp=e00000001d8bf950 bsp=e00000001d8b96b8 [<a000000100037d10>] die_if_kernel+0x50/0x80 sp=e00000001d8bf970 bsp=e00000001d8b9688 [<a000000100633350>] ia64_bad_break+0x270/0x4a0 sp=e00000001d8bf970 bsp=e00000001d8b9660 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280 sp=e00000001d8bfa20 bsp=e00000001d8b9660 [<a000000201003d20>] gfs2_lm_withdraw+0x160/0x220 [gfs2] sp=e00000001d8bfbf0 bsp=e00000001d8b95e0 [<a0000002010346d0>] gfs2_assert_withdraw_i+0x50/0xa0 [gfs2] sp=e00000001d8bfc30 bsp=e00000001d8b9598 [<a000000200ffacb0>] glock_wait_internal+0x3d0/0x700 [gfs2] sp=e00000001d8bfc30 bsp=e00000001d8b9550 [<a000000200ffb580>] gfs2_glock_nq+0x5a0/0x640 [gfs2] sp=e00000001d8bfc30 bsp=e00000001d8b9500 [<a00000020102ad70>] gfs2_inplace_reserve_i+0x570/0xf60 [gfs2] sp=e00000001d8bfc30 bsp=e00000001d8b9458 [<a0000002010024a0>] gfs2_createi+0xa00/0x1d20 [gfs2] sp=e00000001d8bfc90 bsp=e00000001d8b93a0 [<a00000020101bd80>] gfs2_create+0xa0/0x300 [gfs2] sp=e00000001d8bfd20 bsp=e00000001d8b9348 [<a000000100187690>] vfs_create+0x2b0/0x3c0 sp=e00000001d8bfd90 bsp=e00000001d8b92f8 [<a00000010018f470>] open_namei+0x390/0x1100 sp=e00000001d8bfd90 bsp=e00000001d8b9288 [<a00000010015fcb0>] do_filp_open+0x50/0xc0 sp=e00000001d8bfda0 bsp=e00000001d8b9250 [<a00000010015fda0>] do_sys_open+0x80/0x1a0 sp=e00000001d8bfe30 bsp=e00000001d8b9200 [<a00000010015ff90>] sys_open+0x50/0x80 sp=e00000001d8bfe30 bsp=e00000001d8b91a0 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110 sp=e00000001d8bfe30 bsp=e00000001d8b91a0 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e00000001d8c0000 bsp=e00000001d8b91a0
Hit an issue again while attempting to recreate: This time in mkdir(). The initial error seems the same, so I will include it here rather than open a new bz. GFS2: fsid=link_ia64:link_ia640.1: fatal: assertion "test_bit(HIF_HOLDER, &gh->gh_iflags)" failed GFS2: fsid=link_ia64:link_ia640.1: function = glock_wait_internal, file = fs/gfs2/glock.c, line = 1036 GFS2: fsid=link_ia64:link_ia640.1: about to withdraw this file system kernel BUG at fs/gfs2/lm.c:107! d_doio[5883]: bugcheck! 0 [1] Modules linked in: sctp gnbd(U) lock_nolock gfs(U) lock_dlm gfs2 dlm configfs ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc vfat fat dm_multipath button parport_pc lp parport sg ide_cd cdrom lpfc scsi_transport_fc tg3 e100 mii dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 5883, CPU 1, comm: d_doio psr : 00001010085a6010 ifs : 800000000000060f ip : [<a00000020108fd20>] Tainted: G ip is at gfs2_lm_withdraw+0x160/0x220 [gfs2] unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003 rnat: a000000100abae68 bsps: 0000000000000004 pr : 000000000055a559 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000020108fd20 b6 : a0000001000110a0 b7 : a00000010000b820 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100be0270 r2 : a0000001009f80e0 r3 : a000000100928fe0 r8 : 0000000000000023 r9 : a0000001009f8110 r10 : a0000001009f8110 r11 : 0000000000000000 r12 : e00000002a78fce0 r13 : e00000002a788000 r14 : a0000001009f80e0 r15 : 0000000000000000 r16 : a000000100928fe8 r17 : e000000038387e18 r18 : 0000000000000000 r19 : 0000000000000000 r20 : a000000100835280 r21 : a0000001009e08a8 r22 : a0000001009f80e8 r23 : a0000001009f80e8 r24 : a0000001007b9054 r25 : 0000000000000000 r26 : a0000001007b905c r27 : a0000001007b9040 r28 : a0000001007b8008 r29 : e000000038380060 r30 : a0000001007b802c r31 : e00000003838002c Call Trace: [<a000000100013ae0>] show_stack+0x40/0xa0 sp=e00000002a78f870 bsp=e00000002a7896d8 [<a0000001000143e0>] show_regs+0x840/0x880 sp=e00000002a78fa40 bsp=e00000002a789680 [<a000000100037bc0>] die+0x1c0/0x2c0 sp=e00000002a78fa40 bsp=e00000002a789638 [<a000000100037d10>] die_if_kernel+0x50/0x80 sp=e00000002a78fa60 bsp=e00000002a789608 [<a000000100633350>] ia64_bad_break+0x270/0x4a0 sp=e00000002a78fa60 bsp=e00000002a7895d8 [<a00000010000c020>] __ia64_leave_kernel+0x0/0x280 sp=e00000002a78fb10 bsp=e00000002a7895d8 [<a00000020108fd20>] gfs2_lm_withdraw+0x160/0x220 [gfs2] sp=e00000002a78fce0 bsp=e00000002a789560 [<a0000002010c06d0>] gfs2_assert_withdraw_i+0x50/0xa0 [gfs2] sp=e00000002a78fd20 bsp=e00000002a789518 [<a000000201086cb0>] glock_wait_internal+0x3d0/0x700 [gfs2] sp=e00000002a78fd20 bsp=e00000002a7894d0 [<a000000201087580>] gfs2_glock_nq+0x5a0/0x640 [gfs2] sp=e00000002a78fd20 bsp=e00000002a789480 [<a0000002010aa690>] gfs2_delete_inode+0x130/0x440 [gfs2] sp=e00000002a78fd20 bsp=e00000002a789440 [<a0000001001a3170>] generic_delete_inode+0x2b0/0x420 sp=e00000002a78fd60 bsp=e00000002a789410 [<a0000001001a3320>] generic_drop_inode+0x40/0x400 sp=e00000002a78fd60 bsp=e00000002a7893d8 [<a0000002010aad50>] gfs2_drop_inode+0xb0/0xe0 [gfs2] sp=e00000002a78fd60 bsp=e00000002a7893b8 [<a0000001001a21a0>] iput+0x1c0/0x200 sp=e00000002a78fd60 bsp=e00000002a789398 [<a00000010019c0d0>] dentry_iput+0x190/0x1e0 sp=e00000002a78fd60 bsp=e00000002a789370 [<a00000010019ec00>] prune_one_dentry+0xe0/0x160 sp=e00000002a78fd60 bsp=e00000002a789348 [<a00000010019ef50>] prune_dcache+0x2d0/0x420 sp=e00000002a78fd60 bsp=e00000002a7892f8 [<a00000010019f210>] shrink_dcache_parent+0x30/0x280 sp=e00000002a78fd60 bsp=e00000002a7892b8 [<a00000020109ef40>] gfs2_drevalidate+0x460/0x5e0 [gfs2] sp=e00000002a78fd60 bsp=e00000002a789258 [<a000000100188fd0>] __lookup_hash+0x190/0x2e0 sp=e00000002a78fda0 bsp=e00000002a789210 [<a000000100189150>] lookup_hash+0x30/0x60 sp=e00000002a78fda0 bsp=e00000002a7891e8 [<a000000100189230>] lookup_create+0xb0/0x180 sp=e00000002a78fda0 bsp=e00000002a7891c0 [<a00000010018e400>] sys_mkdirat+0xa0/0x1e0 sp=e00000002a78fda0 bsp=e00000002a789140 [<a00000010018e570>] sys_mkdir+0x30/0x60 sp=e00000002a78fe30 bsp=e00000002a7890e8 [<a00000010000bdb0>] __ia64_trace_syscall+0xd0/0x110 sp=e00000002a78fe30 bsp=e00000002a7890e8 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e00000002a790000 bsp=e00000002a7890e8 <0>Kernel panic - not syncing: Fatal exception
I recreated this on my roth cluster; investigating now.
As I suspected, the problem seems to have been caused by this patch (from upstream): http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-2.6-nmw.git;a=commitdiff;h=d1c792b30d2d09683b4fdcc8b7035593ab98dcec That's the oom patch where we got rid of gh_req_bh. Not sure what's wrong with the patch though; still looking at it. I'm hoping maybe Steve will see what's wrong. With that one patch reverted, these tests are not panicking the kernel. However, when I don't specify a scenario, I'm getting a stat verify error. Maybe this is the same invalidation / cluster coherency problem that we were seeing with 428751? Jury's still out. The test runs fine if I run the server with -s chmod,rename,creat or any of the three, but if I don't specify -s, it fails with a nack.
The stat verify error is probably related to hard link handling. d_metaverify doesn't understand that changes to one hard link changes them all. For now don't specify -s link. We need to figure out how to handle those cases in our test tool.
Created attachment 294607 [details] Patch that works This patch is a hybrid between the original code and the current. It fixes the problem by introducing two new glock bits: one to indicate whether a bh (bottom half) is required and one to indicate whether it needs to be a xmote or a drop. That means it does not require any more memory in the glocks. It has been tested across my roth cluster (three node) with this scenario and found to work correctly: d_metaverify -i 120s -I 0535 -w /mnt/gfs2 -R /usr/tests/sts-rhel5.2/var/share/resource_files/roth.xml -s creat,trunc,rename,unlink,stat,mkdir,rmdir,chmod,chown,access There might be a better way to fix the code. This very just fixes the problem by restoring the original code logic. Since Steve wrote the original patch, I'll defer to him whether to use this patch or come up with a better version. But I know this one works.
Incidentally, according to Dean, the stat verify error mentioned in comment #4 turned out to be a problem with the test case. That's why I specified the test the way I did above.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Sorry. The first code change in the patch was unintentional code churn. When researching the problem, I happened to notice that function rq_mutex did something very similar to function gfs2_holder_wake. I changed it temporarily to help me straighten out my brain, and it unintentionally crept into the patch. The only difference between the two is "smp_mb__after_clear_bit" versus "smb_mb". The smb_mb did, in fact, appear after a clear_bit. I don't know if this needs to be reverted or not.
Created attachment 294646 [details] A potential fix Here is an alternative proposal for a fix. It seems that the lock state is not enough on its own, so adding in the gh (or lack of one) should tell us which is the correct routine to run. Also there was some unreachable code which I've removed at the same time. It does at least explain the problems we've been seeing with the "try" locks recently and I wouldn't be surprised if this doesn't fix several of the recently reported bugs.
I've tested Steve's patch from comment #10 against the failing scenario and it worked perfectly. This is a much better solution. I recommend we send this off to rhkernel-list ASAP.
All my "hell" tests pass on this version as well.
in 2.6.18-81.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html