Description of problem: We have been seeing a very occasional crash on boot due to the CONFIG_DEBUG_SPINLOCK check in _read_loc triggering. We have observed this with 2.6.9-67.0.20.EL and 2.6.9-67.0.15.EL kernels although by inspection I think the bug is also present in 2.6.9-78.EL. This issue was fixed upstream by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=30c4cf577fb5b68c16e5750d6bdbd7072e42b279 this doesn't apply cleanly to 2.6.9 due to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1 however the backport (attached) is pretty straight forward. Although we see this under Xen I don't see any reason for it to be Xen specific. It's possible that the virtualised environment allows packets to received much sooner though. ------------[ cut here ]------------ kernel BUG at include/asm/mach-xen/asm/spinlock.h:201! invalid operand: 0000 [#1] SMP Modules linked in: ipt_REJECT(U) ipt_state(U) ip_conntrack(U) iptable_filter(U) ip_tables(U) loop(U) xennet(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) z jbd(U) dm_mod(U) xenblk(U) sd_mod(U) scsi_mod(U) CPU: 0 EIP: 0061:[<c026c80b>] Not tainted VLI EFLAGS: 00010213 (2.6.9-67.0.20.EL.xs4.1.911.31xenU) EIP is at _read_lock+0x9/0x1d eax: dfd8e150 ebx: 00000000 ecx: 00000000 edx: df7a5180 esi: 00000011 edi: dfd8e140 ebp: fb0000e0 esp: c031fe38 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c031f000 task=c029ba40) Stack: c0256840 dfd8e140 fb0000e0 6a0fdc0a decf8000 c022d67a dfd8e140 fb0000e0 6a0fdc0a 00000011 00cf8000 df7a5180 00000000 decd4820 df7a5180 decf8000 c022f726 df7a5180 fb0000e0 6a0fdc0a 00000000 decf8000 00000000 00000000 Call Trace: [<c0256840>] ip_check_mc+0x1b/0x95 [<c022d67a>] ip_route_input+0xd7/0x17d [<c022f726>] ip_rcv_finish+0x26/0x230 [<c022f700>] ip_rcv_finish+0x0/0x230 [<c0222eae>] nf_hook_slow+0x87/0xc9 [<c022f6b3>] ip_rcv+0x3ed/0x43a [<c022f700>] ip_rcv_finish+0x0/0x230 [<c02199ed>] netif_receive_skb+0x29f/0x2e4 [<e0886f08>] netif_poll+0x4a8/0x64c [xennet] [<c0117056>] try_to_wake_up+0x2cb/0x2d6 [<c01246d2>] __mod_timer+0xd3/0x109 [<c012d336>] __rcu_process_callbacks+0xf7/0x110 [<c0219c69>] net_rx_action+0xde/0x1e1 [<c0120d44>] __do_softirq+0x64/0xdd [<c010a35a>] do_softirq+0x61/0x89 ======================= [<c0109ba6>] do_IRQ+0x1a8/0x1b5 [<c01fa2bc>] evtchn_do_upcall+0x84/0xb8 [<c01075c8>] hypervisor_callback+0x2c/0x34 [<c010e164>] safe_halt+0x1a/0x32 [<c0105308>] kernel_thread_helper+0x0/0xb [<c01050cd>] cpu_idle+0xa3/0xbc [<c02f876a>] start_kernel+0x1b2/0x1b6 Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b d1 00 fb 69 27 c0 f0 81 28 00 00 00 01 74 05 e8 bf ed ff ff c3 81 78 04 ed 1e af de 74 08 <0f> 0b c9 00 fb 69 27 c0 \ f0 83 28 01 79 05 e8 c2 ed ff ff c3 81 <0>Kernel panic - not syncing: Fatal exception in interrupt How reproducible: Quite hard, we've seen it exactly twice and we boot RHEL 4 VMs an aweful lot during automated testing.
Created attachment 312638 [details] Backport of git 30c4cf577fb5b68c16e5750d6bdbd7072e42b279 to 2.6.9-67.0.22.EL
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 78.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
*** Bug 444215 has been marked as a duplicate of this bug. ***
Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html