Description of problem: During normal operation of a two node RHEL 5.4 cluster using GFS2 for shared Maildir mailboxes, one node panicked. Version-Release number of selected component (if applicable): RHEL 5.4 64 bit, kernel 2.6.18-164.6.1.el5 DLM (built Oct 27 2009 11:29:06) installed GFS2 (built Oct 27 2009 11:29:46) installed Lock_DLM (built Oct 27 2009 11:29:52) installed How reproducible: Not predictably reproducible. Steps to Reproduce: 1. 2. 3. Actual results: Node panics with associated crash dump information. Remainder of cluster (only one additional node in our case) continues to run. Expected results: Node should not panic. Additional info: original: gfs2_rename+0x19d/0x63b [gfs2] pid : 12810 lock type: 3 req lock state : 1 new: gfs2_rlist_alloc+0x5c/0x6a [gfs2] pid: 12810 lock type: 3 req lock state : 1 G: s:EX n:3/33d0327 f:y t:EX d:EX/0 l:0 a:5 r:4 H: s:EX f:H e:0 p:12810 [imap] gfs2_rename+0x19d/0x63b [gfs2] R: n:54330151 f:05 b:274/274 i:1121 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/gfs2/glock.c:1074 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/irq CPU 1 Modules linked in: nfs fscache nfs_acl lock_dlm gfs2 dlm configfs lockd sunrpc ipv6 xfrm_nalgo crypto_api ipt_LOG xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 8021q dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_amd756 k8temp ide_cd i2c_core hwmon sg amd_rng cdrom k8_edac pcspkr tg3 floppy edac_mc e1000 dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 12810, comm: imap Not tainted 2.6.18-164.6.1.el5 #1 RIP: 0010:[<ffffffff8862a6df>] [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273 RSP: 0018:ffff8101ba8d9868 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff8101ba8d9cb0 RCX: 0000000000000461 RDX: ffff8101ffe27a98 RSI: ffffffff80309c28 RDI: ffffffff80309c20 RBP: ffff8101860b1340 R08: ffffffff80309c28 R09: 000000000000003f R10: ffff8101ba8d9368 R11: 0000000000000000 R12: ffff8100e87ea590 R13: ffff8100e87ea590 R14: ffff8100ed24e000 R15: 0000000000000000 FS: 00002b18a78ac530(0000) GS:ffff810103901940(0000) knlGS:00000000acbfbb90 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b70cf5cf000 CR3: 00000001b4d4a000 CR4: 00000000000006e0 Process imap (pid: 12810, threadinfo ffff8101ba8d8000, task ffff8101ffe277e0) Stack: ffff8101860b1340 0000000000000001 ffff8100b3e1b000 ffff8100b3e1a0e8 0000000000000000 ffffffff8862a74e 0000000000000038 ffff810184e88368 0000000000000001 ffffffff800caa0b 0000000000000005 ffff810184e88368 Call Trace: [<ffffffff8862a74e>] :gfs2:gfs2_glock_nq_m+0x2d/0xf4 [<ffffffff800caa0b>] __kzalloc+0x9/0x21 [<ffffffff88622831>] :gfs2:do_strip+0x175/0x349 [<ffffffff886217e2>] :gfs2:recursive_scan+0xf2/0x175 [<ffffffff886218fe>] :gfs2:trunc_dealloc+0x99/0xe7 [<ffffffff886226bc>] :gfs2:do_strip+0x0/0x349 [<ffffffff80090000>] sched_exit+0xb4/0xb5 [<ffffffff88638dda>] :gfs2:gfs2_delete_inode+0xdd/0x191 [<ffffffff88638d43>] :gfs2:gfs2_delete_inode+0x46/0x191 [<ffffffff88628e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a [<ffffffff88638cfd>] :gfs2:gfs2_delete_inode+0x0/0x191 [<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143 [<ffffffff8863d9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691 [<ffffffff886248c4>] :gfs2:gfs2_dirent_find_space+0x0/0x41 [<ffffffff88623983>] :gfs2:gfs2_dirent_search+0x147/0x16e [<ffffffff886377c5>] :gfs2:gfs2_rename+0x3be/0x63b [<ffffffff88637506>] :gfs2:gfs2_rename+0xff/0x63b [<ffffffff8863754c>] :gfs2:gfs2_rename+0x145/0x63b [<ffffffff88637571>] :gfs2:gfs2_rename+0x16a/0x63b [<ffffffff886375a4>] :gfs2:gfs2_rename+0x19d/0x63b [<ffffffff88629e29>] :gfs2:gfs2_holder_uninit+0xd/0x1f [<ffffffff886385bf>] :gfs2:gfs2_permission+0xaf/0xd4 [<ffffffff88633124>] :gfs2:gfs2_drevalidate+0x158/0x214 [<ffffffff8000d902>] permission+0x81/0xc8 [<ffffffff8002a7d9>] vfs_rename+0x2f4/0x471 [<ffffffff80036c20>] sys_renameat+0x180/0x1eb [<ffffffff800b66f5>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 f8 27 64 88 c2 32 04 be 01 00 00 00 4c 89 ef e8 df RIP [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273 RSP <ffff8101ba8d9868> <0>Kernel panic - not syncing: Fatal exception Killed by signal 15.
Reassigning to Steve Whitehouse, since he talked to you about it.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Conditions required to hit this bug: 1. The rename must result in the unlinking of an inode 2. The rename must require the allocation of a block in order to satisfy the space requirement for adding the new directory entry. 3. The resource group in which both the old inode and the new blocks are being allocated must be the same. At that point we land up trying to get two resource group locks at the same time. The second event is relatively unlikely since not only will the initial unlink have created some space in the target directory, but also we only need to allocate new blocks occasionally as a directory grows. In fact we only need to search the same resource group for the block allocation rather than actually select it, which does make it a bit more likely that this bug will trigger. On the other hand, in any large filesystem there will be a lot of resource groups, so the chances of this happening will reduce with filesystem size. We can drop the lock on the rgrp early with a very simple patch and that will prevent us from hitting this bug again. On the other hand, thats not quite the whole story as there is still an issue wrt the locking of the two resource groups and their relative ordering. That will need to be addresses in order to avoid distributed deadlock. Bearing in mind the complexity of that, and the likelihood of two nodes hitting this at the same time (considering that its tricky to hit even on a single node) it might be better to do the simple fix first. That will no doubt cover the majority of cases.
The slightly odd thing about this bug is that the only time we need to add a new block to a directory is when there isn't enough space in it already. Given that the only time we unlink an inode is when there is a target inode directory entry with the same name as the source inode's directory entry, there should always be enough space (since the target inode's entry will have been removed). So there might be more to this issue than immediately apparent.
Changing the name of this bug so that I don't confuse myself again. Also, I think I might have a fix for it now. Just testing the upstream version and a RHEL5 version will be on its way once I've done some testing upstream.
Created attachment 373130 [details] Proposed patch This is an upstream patch aimed at fixing the reported issue.
Created attachment 373134 [details] RHEL5 version of patch This is the RHEL5 version of the original patch.
Allen, if we supply you with a test kernel with the patch from comment #11, are you in a position to see if it fixes the bug?
Steve, I would be happy to. Of course, the issue is so relatively rare that it will be a bit difficult to know for sure. Thanks for all of the quick work on this!
In case this is useful, here are copies of the fsck logs that I generated over the weekend. You'll note that numerous errors were corrected, despite fsck having been run pretty recently. Perhaps these contributed to triggering this bug.
Created attachment 373253 [details] fsck log of first filesystem
Created attachment 373254 [details] fsck log of second filesystem
Created attachment 375402 [details] Test kernel Allen, please find attached a test kernel rpm. If you need the other bits and bobs (kernel headers, debug stuff, etc) then let me know and I'll attach that too. Let us know how you get on.
post2:/root # uname -a Linux post2.isye.gatech.edu 2.6.18-175.gfs2abhi.001 #1 SMP Tue Dec 1 09:59:50 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Both nodes are up and running on the test kernel. Nothing unusual so far. Since this is such a rare issue, it may be a while before I can confidently state that "the problem is gone", but nothing is grossly broken. Thanks! Allen
Allen, any more news? If you've not hit any further issues then I'm seriously considering pushing this patch into our next version...
Hi Steve, I've seen no further occurrences of the problem described in this bug, and the patched kernel hasn't added any new problems that I can see. Should be safe to go for it, thanks.
in kernel-2.6.18-180.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html