429868 – kernel BUG at net/core/skbuff.c:228

Bug 429868 - kernel BUG at net/core/skbuff.c:228

Summary: kernel BUG at net/core/skbuff.c:228

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-01-23 15:37 UTC by Flavio Leitner
Modified:	2018-10-19 22:12 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-02-14 21:45:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Flavio Leitner 2008-01-23 15:37:25 UTC

Description of problem:
Customer uploaded a vmcore with a kernel panic and wants to know 
the root cause.

 Warning: kfree_skb passed an skb still on a list (from 39f082c0).
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:228!
 invalid operand: 0000 [#1]
 SMP
 Modules linked in: nfs lockd nfs_acl mptctl mptbase sg gab(U) llt(U) 
 netconsole netdump i2c_dev i2c_core sunrpc ide_dump cciss_dump scsi_dump
 diskdump zlib_deflate lpfcdfc dmpaa(U) vxspec(U) vxio(U) vxdmp(U) fdd(U)
 vxportal(U) vxfs(U) dm_mod joydev button battery ac ohci_hcd hw_random 
 tg3 e1000 floppy ext3 jbd lpfc scsi_transport_fc cciss sd_mod scsi_mod
 CPU:    3
 EIP:    0060:[<022756c0>]    Tainted: PF     VLI
 EFLAGS: 00010246   (2.6.9-42.0.8.EL.IT122348HFhugemem)
 EIP is at __kfree_skb+0x19/0xf7
 eax: 00000045   ebx: f65c0240   ecx: 023c8f70   edx: 022ffe37
 esi: f75e9200   edi: f88f285c   ebp: 0000006b   esp: 023c8f6c
 ds: 007b   es: 007b   ss: 0068
 Process tibemsd (pid: 13849, threadinfo=023c8000 task=f083f430)
 Stack: 022ffe37 39f082c0 7b8eaa00 f65c0240 f893922e 00000001 00000000 0000006b
      39f7e6b0 f65c0000 f65c0240 7b8eaa38 f65c0240 f64c9400 f65c02dc 00000040
      f8939070 00000000 00000000 023c8fd4 00000000 00000000 f64c9400 0b0bbd80
 Call Trace:
 [<f893922e>] e1000_clean_tx_irq+0xe5/0x293 [e1000]
 [<f8939070>] e1000_clean+0x9d/0x176 [e1000]
 [<0227aaab>] net_rx_action+0xae/0x160
 [<02126424>] __do_softirq+0x4c/0xb1
 [<021084af>] do_softirq+0x4f/0x56
 =======================
 [<0211745a>] smp_apic_timer_interrupt+0x9a/0x9c
 [<0211254b>] get_offset_pmtmr+0x19/0xace
 [<0210b0ce>] do_gettimeofday+0x1a/0x9c
 [<02125d81>] sys_gettimeofday+0x17/0x55
 Code:  Bad EIP value.


Version-Release number of selected component (if applicable):
2.6.9-42.0.8.EL.IT122348HFhugemem

How reproducible:
One time event

Steps to Reproduce:
Not reproduced anymore
  
Additional info:
* Core is on seg.rdu.redhat.com/export/nfs/samfw/136995
case_136955_vmcore
* The kernel is a hotfix kernel

Analysis done so far:
PID = 13849,
...
#9 [ 23c8f7c] e1000_clean_tx_irq at f8939229
   [RA: f8939070  SP: 23c8f80  FP: 23c8fac  SIZE: 48]
   23c8f80: 00000001  00000000  0000006b  39f7e6b0
   23c8f90: f65c0000  f65c0240  7b8eaa38  f65c0240
   23c8fa0: f64c9400  f65c02dc  00000040  f8939070
...

There is a (struct e1000_adapter *) on 0xf65c0240,
crash> e1000_adapter.tx_ring 0xf65c0240
 tx_ring = 0xf75e9200,

and
crash> e1000_tx_ring 0xf75e9200
struct e1000_tx_ring {
 desc = 0x39f7e000,
 dma = 938991616,
 size = 4096,
 count = 256,              <-------- how many buffers allocated
 next_to_use = 56,
 next_to_clean = 107,
 buffer_info = 0xf88f2000, <-------- buffers
 tx_lock = {
   lock = 1,
   magic = 3735899821
 },
 tdh = 14352,
 tdt = 14360,
 last_tx_tso = FALSE
}

Iteracting from first buffer_info to last one we can get all sk_buff
and then all ->list, see:
crash> e1000_buffer.skb 0xf88f2000
 skb = 0xf647b6c0,
crash> sk_buff.list 0xf647b6c0
 list = 0x0,

The results is only one sk_buff with ->list not 0x0:
...
crash> sk_buff.list 0x7b8eaa00
 list = 0xb0bbd8c,               <-------- BUG()
crash> sk_buff.list 0x52ba0100
 list = 0x0,
crash> sk_buff.list 0xf21d3180
 list = 0x0
....

We can confirm that checking on __skb_free() stack frame for 0x7b8eaa00:
 #8 [ 23c8f68] __kfree_skb at 22756c0
   [RA: f893922e  SP: 23c8f6c  FP: 23c8f7c  SIZE: 20]
   23c8f6c: 022ffe37  39f082c0  *7b8eaa00*  f65c0240
   23c8f7c: f893922e

Checking the sk_buff the fields appears to be correctly to me, so 
they are valid and __kfree_skb() is right to BUG() on it.

What is calling my attention now are input_dev and dev fields:

crash> sk_buff.dev 7b8eaa00
 dev = 0x233c160,
crash> sk_buff.input_dev 7b8eaa00
 input_dev = 0x233c160,
crash> sk_buff.real_dev 7b8eaa00
 real_dev = 0x0,

Notes dev == input_dev and:
crash> net_device.name 0x233c160
 name = "lo\000\000\000\000\000\000\000\000\000\000\000\000\000",

This sk_buff was actually received from loopback interface and it
never should be on e1000 tx queue.

The eth0 interface is using netlog/netdump so it may be related to 
our bug but there is no similar report until now and I didn't find 
anything that could point that too.

The other processes running are java (pid=4439), cp (pid=9008), 
java (pid=16174) and they haven't nothing related to network or 
any area in backtrace that could impact on our scenario, like a 
race condition or something.

The tx_queue_lock is held and the magic number is valid so the 
queue is safe at that point, other struct fields are ok too.

The next_to_clean is 107, which points to our buffer_info and the 
board descriptor  (at 0x39f7e6b0)

crash> e1000_tx_desc 39f7e6b0
struct e1000_tx_desc {
 buffer_addr = 0x37efee02,
 lower = {
   data = 0x8b00004b,
   flags = {
     length = 0x4b,
     cso = 0x0,
     cmd = 0x8b
   }
 },
 upper = {
   data = 0x1,
   fields = {
     status = 0x1,
     css = 0x0,
     special = 0x0
   }
 }
}

The e1000_buffer:
crash> e1000_buffer.skb,next_to_watch,dma f88f285c
 skb = 0x7b8eaa00,
 next_to_watch = 0x6b
 dma = 0x37efee02,

Note that both are connected and correct although e1000_buffer->skb 
is valid but with another unexpected content.

Looking in vmcore memory finding relations between the skb and 
another areas we can see the following references:

Addr referencing: Addr referenced
23c8ebc: 7b8eaa00 --\
23c8ed8: 7b8eaa00 --+--- softirq_stack
23c8f74: 7b8eaa00 --/

b0bbd8c: 7b8eaa00 ----+-- another sk_buff (sk_buff_head)
b0bbd90: 7b8eaa00 ---/

39fe01b0: 7b8eaa00 -----\
39fe30e8: 7b8eaa00 ------+-- SLAB size-512/Unidentified
39fe30f0: 7b8eaa00 -----/

f77d61d8: 7b8eaa00 ----+---- Unidentified
f77d61e0: 7b8eaa00 ----/

f88f285c: 7b8eaa00 --------- diskdump storing softirq_stack on disk

There are still two areas unidentified that may help us see what 
was using this socket and give a clue to find the root cause, but 
this is a partial dump and we maybe missing important information now:
crash> sys
     KERNEL: /usr/lib/debug/lib/modules/2.6.9-42.0.8.EL.IT122348HFhugemem/vmlinux
   DUMPFILE: case_136955_vmcore  [PARTIAL DUMP]
                                 ^^^^^^^^^^^^^^^

I'm thinking that skb_buff 0x7b8eaa00 was correct before because 
to get in tx_ring it should have had ->dev correctly to call 
dev->hard_start_xmit().

The entries in tx_ring are correct and so the others sk_buff, 
then the 0x7b8eaa00 is the only entry with ->list pointer filled.

The e1000 driver does:
e1000_unmap_and_free_tx_resource(struct e1000_adapter *adapter,
                       struct e1000_buffer *buffer_info)
{
...
       if (buffer_info->skb)
               dev_kfree_skb_any(buffer_info->skb);
       memset(buffer_info, 0, sizeof(struct e1000_buffer));
}

The code above calls memset() to clean with zeros the entire 
buffer_info, which includes buffer_info->skb too, meaning that 
buffer_info was valid in tx_ring and not zeroed yet.

I'm thinking in the following event sequence:

1. the sk_buff was correct and added to e1000 tx_ring

2. sk_buff was transmitted but NOT freed yet

3. some time later a softirq woke up and tried to clean
  all sent sk_buff from tx_ring finding this sk_buff altered.

I'm almost certain that e1000 isn't the culprit here, nothing points 
to it until now, and the new/corrupted sk_buff comes from loopback 
and it is also correct/valid. With that in mind, 
I can imagine these two scenarios:

1) The caller of dev->hard_start_xmit() passed a free sk_buff.

The theory here is sk_buff was freed _before_ be passed to e1000 driver
and then the sk_buff was recycled throw slab cache to another interface
(loopback), then when softirq woke up found an unexpected sk_buff.
(some iptable rules, traffic control and etc..) Low probability.

2) The caller of dev->hard_start_xmit() freed sk_buff right after.

The theory here is sk_buff was delivered to e1000 driver and then
freed, so it was transmitted but also recycled throw slab cache causing
the softirq to find out an unexpected sk_buff.

The problem is finding out where this sk_buff was freed assuming that
my thoughts are correct.

Any idea what to do next to find out the root cause? 

Flavio

Note You need to log in before you can comment on or make changes to this bug.