456653 – Crash due to incorrect inet{,6} device initialization order

Bug 456653 - Crash due to incorrect inet{,6} device initialization order

Summary: Crash due to incorrect inet{,6} device initialization order

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.6.z
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Thomas Graf
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	444215 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-25 11:31 UTC by Ian Campbell
Modified:	2018-10-20 01:24 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-18 19:37:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Backport of git 30c4cf577fb5b68c16e5750d6bdbd7072e42b279 to 2.6.9-67.0.22.EL (2.11 KB, patch) 2008-07-25 11:31 UTC, Ian Campbell	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1024	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update	2009-05-18 14:57:26 UTC

Description Ian Campbell 2008-07-25 11:31:35 UTC

Description of problem:

We have been seeing a very occasional crash on boot due to the
CONFIG_DEBUG_SPINLOCK check in _read_loc triggering. We have observed this with
2.6.9-67.0.20.EL and 2.6.9-67.0.15.EL kernels although by inspection I think the
bug is also present in 2.6.9-78.EL.

This issue was fixed upstream by
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=30c4cf577fb5b68c16e5750d6bdbd7072e42b279
this doesn't apply cleanly to 2.6.9 due to
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8814c4b533817df825485ff32ce6ac406c3a54d1
however the backport (attached) is pretty straight forward.

Although we see this under Xen I don't see any reason for it to be Xen specific.
It's possible that the virtualised environment allows packets to received much
sooner though.

------------[ cut here ]------------
kernel BUG at include/asm/mach-xen/asm/spinlock.h:201!
invalid operand: 0000 [#1]
SMP 
Modules linked in: ipt_REJECT(U) ipt_state(U) ip_conntrack(U) iptable_filter(U)
ip_tables(U) loop(U) xennet(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) z
jbd(U) dm_mod(U) xenblk(U) sd_mod(U) scsi_mod(U)
CPU:    0
EIP:    0061:[<c026c80b>]    Not tainted VLI
EFLAGS: 00010213   (2.6.9-67.0.20.EL.xs4.1.911.31xenU) 
EIP is at _read_lock+0x9/0x1d
eax: dfd8e150   ebx: 00000000   ecx: 00000000   edx: df7a5180
esi: 00000011   edi: dfd8e140   ebp: fb0000e0   esp: c031fe38
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c031f000 task=c029ba40)
Stack: c0256840 dfd8e140 fb0000e0 6a0fdc0a decf8000 c022d67a dfd8e140 fb0000e0 
       6a0fdc0a 00000011 00cf8000 df7a5180 00000000 decd4820 df7a5180 decf8000 
       c022f726 df7a5180 fb0000e0 6a0fdc0a 00000000 decf8000 00000000 00000000 
Call Trace:
 [<c0256840>] ip_check_mc+0x1b/0x95
 [<c022d67a>] ip_route_input+0xd7/0x17d
 [<c022f726>] ip_rcv_finish+0x26/0x230
 [<c022f700>] ip_rcv_finish+0x0/0x230
 [<c0222eae>] nf_hook_slow+0x87/0xc9
 [<c022f6b3>] ip_rcv+0x3ed/0x43a
 [<c022f700>] ip_rcv_finish+0x0/0x230
 [<c02199ed>] netif_receive_skb+0x29f/0x2e4
 [<e0886f08>] netif_poll+0x4a8/0x64c [xennet]
 [<c0117056>] try_to_wake_up+0x2cb/0x2d6
 [<c01246d2>] __mod_timer+0xd3/0x109
 [<c012d336>] __rcu_process_callbacks+0xf7/0x110
 [<c0219c69>] net_rx_action+0xde/0x1e1
 [<c0120d44>] __do_softirq+0x64/0xdd
 [<c010a35a>] do_softirq+0x61/0x89
 =======================
 [<c0109ba6>] do_IRQ+0x1a8/0x1b5
 [<c01fa2bc>] evtchn_do_upcall+0x84/0xb8
 [<c01075c8>] hypervisor_callback+0x2c/0x34
 [<c010e164>] safe_halt+0x1a/0x32
 [<c0105308>] kernel_thread_helper+0x0/0xb
 [<c01050cd>] cpu_idle+0xa3/0xbc
 [<c02f876a>] start_kernel+0x1b2/0x1b6
Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b d1 00 fb 69 27 c0 f0 81 28 00 00 00
01 74 05 e8 bf ed ff ff c3 81 78 04 ed 1e af de 74 08 <0f> 0b c9 00 fb 69 27 c0 \
f0 83 28 01 79 05 e8 c2 ed ff ff c3 81 
 <0>Kernel panic - not syncing: Fatal exception in interrupt

How reproducible:

Quite hard, we've seen it exactly twice and we boot RHEL 4 VMs an aweful lot
during automated testing.

Comment 1 Ian Campbell 2008-07-25 11:31:35 UTC

Created attachment 312638 [details]
Backport of git 30c4cf577fb5b68c16e5750d6bdbd7072e42b279 to 2.6.9-67.0.22.EL

Comment 4 RHEL Program Management 2008-12-17 20:19:09 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Vivek Goyal 2009-01-07 14:36:50 UTC

Committed in 78.24.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 7 Linda Wang 2009-04-06 20:58:44 UTC

*** Bug 444215 has been marked as a duplicate of this bug. ***

Comment 8 Chris Ward 2009-05-05 13:57:15 UTC

Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?

Comment 10 errata-xmlrpc 2009-05-18 19:37:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.