Bug 456653

Summary: Crash due to incorrect inet{,6} device initialization order
Product: Red Hat Enterprise Linux 4 Reporter: Ian Campbell <ijc>
Component: kernelAssignee: Thomas Graf <tgraf>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 4.6.zCC: cward, davem, dmair, fleite, nhorman, rkhan, tao, vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:37:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backport of git 30c4cf577fb5b68c16e5750d6bdbd7072e42b279 to 2.6.9-67.0.22.EL none

Description Ian Campbell 2008-07-25 11:31:35 UTC
Description of problem:

We have been seeing a very occasional crash on boot due to the
CONFIG_DEBUG_SPINLOCK check in _read_loc triggering. We have observed this with
2.6.9-67.0.20.EL and 2.6.9-67.0.15.EL kernels although by inspection I think the
bug is also present in 2.6.9-78.EL.

This issue was fixed upstream by
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=30c4cf577fb5b68c16e5750d6bdbd7072e42b279
this doesn't apply cleanly to 2.6.9 due to
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1
however the backport (attached) is pretty straight forward.

Although we see this under Xen I don't see any reason for it to be Xen specific.
It's possible that the virtualised environment allows packets to received much
sooner though.

------------[ cut here ]------------
kernel BUG at include/asm/mach-xen/asm/spinlock.h:201!
invalid operand: 0000 [#1]
SMP 
Modules linked in: ipt_REJECT(U) ipt_state(U) ip_conntrack(U) iptable_filter(U)
ip_tables(U) loop(U) xennet(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) z
jbd(U) dm_mod(U) xenblk(U) sd_mod(U) scsi_mod(U)
CPU:    0
EIP:    0061:[<c026c80b>]    Not tainted VLI
EFLAGS: 00010213   (2.6.9-67.0.20.EL.xs4.1.911.31xenU) 
EIP is at _read_lock+0x9/0x1d
eax: dfd8e150   ebx: 00000000   ecx: 00000000   edx: df7a5180
esi: 00000011   edi: dfd8e140   ebp: fb0000e0   esp: c031fe38
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c031f000 task=c029ba40)
Stack: c0256840 dfd8e140 fb0000e0 6a0fdc0a decf8000 c022d67a dfd8e140 fb0000e0 
       6a0fdc0a 00000011 00cf8000 df7a5180 00000000 decd4820 df7a5180 decf8000 
       c022f726 df7a5180 fb0000e0 6a0fdc0a 00000000 decf8000 00000000 00000000 
Call Trace:
 [<c0256840>] ip_check_mc+0x1b/0x95
 [<c022d67a>] ip_route_input+0xd7/0x17d
 [<c022f726>] ip_rcv_finish+0x26/0x230
 [<c022f700>] ip_rcv_finish+0x0/0x230
 [<c0222eae>] nf_hook_slow+0x87/0xc9
 [<c022f6b3>] ip_rcv+0x3ed/0x43a
 [<c022f700>] ip_rcv_finish+0x0/0x230
 [<c02199ed>] netif_receive_skb+0x29f/0x2e4
 [<e0886f08>] netif_poll+0x4a8/0x64c [xennet]
 [<c0117056>] try_to_wake_up+0x2cb/0x2d6
 [<c01246d2>] __mod_timer+0xd3/0x109
 [<c012d336>] __rcu_process_callbacks+0xf7/0x110
 [<c0219c69>] net_rx_action+0xde/0x1e1
 [<c0120d44>] __do_softirq+0x64/0xdd
 [<c010a35a>] do_softirq+0x61/0x89
 =======================
 [<c0109ba6>] do_IRQ+0x1a8/0x1b5
 [<c01fa2bc>] evtchn_do_upcall+0x84/0xb8
 [<c01075c8>] hypervisor_callback+0x2c/0x34
 [<c010e164>] safe_halt+0x1a/0x32
 [<c0105308>] kernel_thread_helper+0x0/0xb
 [<c01050cd>] cpu_idle+0xa3/0xbc
 [<c02f876a>] start_kernel+0x1b2/0x1b6
Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b d1 00 fb 69 27 c0 f0 81 28 00 00 00
01 74 05 e8 bf ed ff ff c3 81 78 04 ed 1e af de 74 08 <0f> 0b c9 00 fb 69 27 c0 \
f0 83 28 01 79 05 e8 c2 ed ff ff c3 81 
 <0>Kernel panic - not syncing: Fatal exception in interrupt

How reproducible:

Quite hard, we've seen it exactly twice and we boot RHEL 4 VMs an aweful lot
during automated testing.

Comment 1 Ian Campbell 2008-07-25 11:31:35 UTC
Created attachment 312638 [details]
Backport of git 30c4cf577fb5b68c16e5750d6bdbd7072e42b279 to 2.6.9-67.0.22.EL

Comment 4 RHEL Program Management 2008-12-17 20:19:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Vivek Goyal 2009-01-07 14:36:50 UTC
Committed in 78.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 7 Linda Wang 2009-04-06 20:58:44 UTC
*** Bug 444215 has been marked as a duplicate of this bug. ***

Comment 8 Chris Ward 2009-05-05 13:57:15 UTC
Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?

Comment 10 errata-xmlrpc 2009-05-18 19:37:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html