Bug 248352 - Unable to handle kernel paging request oops (slab_get_obj)
Summary: Unable to handle kernel paging request oops (slab_get_obj)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
: ---
Assignee: Clark Williams
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-16 12:00 UTC by IBM Bug Proxy
Modified: 2008-03-25 23:32 UTC (History)
0 users

Fixed In Version: -43
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-22 21:27:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
out (98.64 KB, text/plain)
2007-08-04 01:20 UTC, IBM Bug Proxy
no flags Details
mini17.patch (2.56 KB, text/plain)
2007-08-07 02:55 UTC, IBM Bug Proxy
no flags Details
Steven's recent patch against -31 (4.88 KB, text/plain)
2007-08-10 20:15 UTC, IBM Bug Proxy
no flags Details
Don't run the softirq routine from the hard IRQ thread (2.71 KB, patch)
2007-08-24 14:50 UTC, Steven Rostedt
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 36397 0 None None None Never

Description IBM Bug Proxy 2007-07-16 12:00:45 UTC
LTC Owner is: jstultz.com
LTC Originator is: dbhilley.com


Problem description:
  Stressing the network interface causes a kernel oops:

c1f2bc2n13 kernel: Unable to handle kernel paging request at ffff81051fbe06b4 RIP: 
c1f2bc2n13 kernel:  [<ffffffff802dbaed>] slab_get_obj+0x14/0x21
c1f2bc2n13 kernel: PGD 8063 PUD 0 
c1f2bc2n13 kernel: Oops: 0000 [1] PREEMPT SMP 

If this is a customer issue, please indicate the impact to the customer:


If this is not an installation problem,
       Describe any custom patches installed.
GPFS is compiled and loaded as a kernel module.

       Provide output from "uname -a", if possible:
Linux c1f2bc2n13 2.6.21-31.el5rt #1 SMP PREEMPT RT Mon Jun 18 16:44:12 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux

Hardware Environment
    Machine type (p650, x235, SF2, etc.): IBM eServer BladeCenter LS20 -[8850Z47]-
    Cpu type (Power4, Power5, IA-64, etc.): x86_64
    Describe any special hardware you think might be relevant to this problem:


Please provide contact information if the submitter is not the primary contact.


Please provide access information for the machine if it is available.


Is this reproducible?  Yes
    If so, how long does it (did it) take to reproduce it?
    Describe the steps:

Simply stress the network interface using iperf for a few hours.  I've done this
several times.

    If not, describe how the bug was encountered:


Is the system (not just the application) hung?
    If so, describe how you determined this:


Did the system produce an OOPS message on the console?
    If so, copy it here:

On Jul 13:
c1f2bc2n13 kernel: Unable to handle kernel paging request at ffff81051fbe06b4 RIP: 
c1f2bc2n13 kernel:  [<ffffffff802dbaed>] slab_get_obj+0x14/0x21
c1f2bc2n13 kernel: PGD 8063 PUD 0 
c1f2bc2n13 kernel: Oops: 0000 [1] PREEMPT SMP 

On Jul 8:
c1f2bc2n13 kernel: Unable to handle kernel paging request at ffff81051fbe06b4 RIP: 
c1f2bc2n13 kernel:  [<ffffffff802dbaed>] slab_get_obj+0x14/0x21
c1f2bc2n13 kernel: PGD 8063 PUD 0 
c1f2bc2n13 kernel: Oops: 0000 [1] PREEMPT SMP 

Is the system sitting in a debugger right now?
    If so, how long may it stay there?


Additional information:
Since this is caused by stressing the network, it may be related to bug
#36015.(RH246711)

Reproduced again with a slightly different function in the backtrace and more
info in /var/log/messages after reboot:

c1f2bc2n13 kernel: Unable to handle kernel paging request at ffff810174607e80 RIP: 
c1f2bc2n13 kernel:  [<ffffffff8022f8fe>] __alloc_skb+0xbe/0x140
c1f2bc2n13 kernel: PGD 8063 PUD 0 
c1f2bc2n13 kernel: Oops: 0002 [1] PREEMPT SMP 
c1f2bc2n13 kernel: CPU 3 
c1f2bc2n13 kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc
ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi
scsi_transport_iscsi dm_mirror dm_multipath dm_mod video sbs i2c_ec dock button
battery asus_acpi ac parport_pc lp parport sg shpchp pcspkr amd_rng i2c_amd756
i2c_core k8temp hwmon tg3 rtc_cmos rtc_core rtc_lib serio_raw usb_storage
qla2xxx scsi_transport_fc mptspi mptscsih mptbase scsi_transport_spi sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
c1f2bc2n13 kernel: Pid: 46, comm: softirq-net-rx/ Not tainted 2.6.21-31.el5rt #1
c1f2bc2n13 kernel: RIP: 0010:[<ffffffff8022f8fe>]  [<ffffffff8022f8fe>]
__alloc_skb+0xbe/0x140
c1f2bc2n13 kernel: RSP: 0018:ffff81011fdcfcc0  EFLAGS: 00010212
c1f2bc2n13 kernel: RAX: ffff810174607e80 RBX: ffff810174607800 RCX: 0000000000000000
c1f2bc2n13 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810072a29a90
c1f2bc2n13 kernel: RBP: ffff81011fdcfd00 R08: 0000000000000000 R09: ffff810072a299c0
c1f2bc2n13 kernel: R10: ffff81011fdcfbb0 R11: ffff81007cb29e30 R12: 0000000000000680
c1f2bc2n13 kernel: R13: ffff810072a299c0 R14: 0000000000000680 R15: 0000000000000020
c1f2bc2n13 kernel: FS:  0000000040800940(0000) GS:ffff81011fd9f6c0(0000)
knlGS:0000000000000000
c1f2bc2n13 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
c1f2bc2n13 kernel: CR2: ffff810174607e80 CR3: 00000001155cc000 CR4: 00000000000006e0
c1f2bc2n13 kernel: Process softirq-net-rx/ (pid: 46, threadinfo
ffff81011fdce000, task ffff810037c77040)
c1f2bc2n13 kernel: Stack:  ffff81011fdcfd20 0000000080236999 ffff81011f838480
ffff81007fe88000
c1f2bc2n13 kernel:  ffff81007fe88900 ffff81007db718d0 ffff81007cad31a0
0000000000000642
c1f2bc2n13 kernel:  ffff81011fdcfd20 ffffffff80429e12 ffff81007fe88000
00000000000000c5
c1f2bc2n13 kernel: Call Trace:
c1f2bc2n13 kernel:  [<ffffffff80429e12>] __netdev_alloc_skb+0x31/0x4e
c1f2bc2n13 kernel:  [<ffffffff88234e1d>] :tg3:tg3_alloc_rx_skb+0xe1/0x177
c1f2bc2n13 kernel:  [<ffffffff8823c668>] :tg3:tg3_poll+0x470/0xa39
c1f2bc2n13 kernel:  [<ffffffff8020c441>] net_rx_action+0xbe/0x1f2
c1f2bc2n13 kernel:  [<ffffffff80295ab0>] ksoftirqd+0x16c/0x271
c1f2bc2n13 kernel:  [<ffffffff80233d76>] kthread+0xf5/0x128
c1f2bc2n13 kernel:  [<ffffffff8025ff68>] child_rip+0xa/0x12
c1f2bc2n13 kernel: 
c1f2bc2n13 kernel: 
c1f2bc2n13 kernel: Code: c7 00 01 00 00 00 66 c7 40 04 00 00 66 c7 40 06 00 00
66 c7 
c1f2bc2n13 kernel: RIP  [<ffffffff8022f8fe>] __alloc_skb+0xbe/0x140
c1f2bc2n13 kernel:  RSP <ffff81011fdcfcc0>
c1f2bc2n13 kernel: CR2: ffff810174607e80

Again, tg3 is involved, which makes me wonder if it is possibly related to bug
#36015.(RH246711)

Comment 1 Tim Burke 2007-07-19 18:58:18 UTC
Changed state to NEEDINFO as IBM is trying to isolate whether this problem
occurs if GPFS is not loaded.

Comment 2 IBM Bug Proxy 2007-07-19 19:25:13 UTC
----- Additional Comments From dbhilley.com  2007-07-19 15:19 EDT -------
Yes, it occurs when GPFS is not loaded.  Check the second backtrace I posted as
it contains a list of "modules linked in."  You can verify that mmfs is not
among them. 

Comment 3 Tim Burke 2007-07-23 20:37:31 UTC
Can you give us any suggestions on how to reproduce?


Comment 4 Tim Burke 2007-07-23 20:40:31 UTC
Can it be reproduced on a different nic? ie, e1000


Comment 5 IBM Bug Proxy 2007-07-25 01:05:24 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-24 21:02 EDT -------
I'm working to reproduce w/ iperf. 

Comment 6 IBM Bug Proxy 2007-07-25 04:55:13 UTC
----- Additional Comments From dbhilley.com  2007-07-25 00:53 EDT -------
My iperf testing setup is very simple.  I have two nodes and I run a
unidirectional client and server on both nodes with a very long runtime value. 
All of the rest of the settings are untouched (default: TCP, as fast as
possible, default buffer size, etc.).  

On node1 I'd run:
$ iperf -s -p 9999
$ iperf -c node2 -p 9999 -t 9999999

On node2 I'd run:
$ iperf -s -p 9999
$ iperf -c node1 -p 9999 -t 9999999

I just run this continuously and in 2-24 hours one of the nodes will almost
always crash.  Sometimes I get nothing in the messages output, sometimes I get
some variation on "Unable to handle kernel paging request oops" (as posted). 
Other configurations, such as UDP, UDP capped at 100mbit, bidirectional client
on one machine and server on the other, etc. have also elicited crashes.  

As far as testing with another NIC, these are LS20 blades with two built-in
Broadcom gigE NICs.  I'm not sure if adding another NIC is feasible. 

Comment 8 IBM Bug Proxy 2007-07-27 21:45:20 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-27 17:42 EDT -------
I tried the iperf test for two days over loopback on a single box and did not
reproduce the issue. It seems the actual network driver needs to be in use. I'll
setup a multi-system test as described soon. 

Comment 9 IBM Bug Proxy 2007-07-28 00:40:24 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-27 20:37 EDT -------
Currently running half duplex version of the above, with server on ltcrt18 and
client on ltcrt15. 

Comment 10 IBM Bug Proxy 2007-07-28 00:55:18 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-27 20:51 EDT -------
Cool. That didn't take long to trigger. The issue is fairly quickly reproduced
w/ one way transfers, w/ the client being the one that crashes. 

Comment 11 IBM Bug Proxy 2007-07-30 23:10:19 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-30 19:08 EDT -------
I'm checking to see if this can be reproduced w/ upstream 2.6.22.1-rt9 

Comment 12 IBM Bug Proxy 2007-07-31 01:30:36 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-30 21:24 EDT -------
Could not easily reproduce w/ 2.6.22.1-rt9. 

Comment 13 IBM Bug Proxy 2007-08-01 01:10:34 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-07-31 21:06 EDT -------
With 2.6.21-31 while trying to reproduce I got the following:
tg3: eth0: transmit timed out, resetting
tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]

Over and over. Finally followed by a few:
PM: Writing back config space on device 0000:02:01.0 at offset b (was 164814e4,
writing 3011014)
PM: Writing back config space on device 0000:02:01.0 at offset 3 (was 804000,
writing 804010)
PM: Writing back config space on device 0000:02:01.0 at offset 2 (was 2000000,
writing 2000010)
PM: Writing back config space on device 0000:02:01.0 at offset 1 (was 2b00000,
writing 2b00146)
PM: Writing back config space on device 0000:02:01.0 at offset 0 (was 164914e4,
writing 16a814e4)

tg3: eth0: Link is down.

Followed by some more "PM: Writing back.." messages, then finally:
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.

The box was still up and usable, but network connections were dropped. 

Comment 14 IBM Bug Proxy 2007-08-01 03:15:24 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3




------- Additional Comments From dbhilley.com  2007-07-31 23:09 EDT -------
I got that same repeated output on bug #36015 which I believe may be related: 
tg3: eth0: transmit timed out, resetting
tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]

except, in that case the messages were repeated for hours and hours and the
interface never came back up.  I had to restart the machines remotely.  When
encountering this bug, however, I never got those messages.  

On the last excerpt I posted, where __alloc_skb was the offending function,
there was no output at all after the backtrace and the next entry is the syslogd
restart (12+ hours later).  In the first two excerpts I posted, where
slab_get_obj is the offending function, I didn't even get a backtrace at all or
a list of modules linked in, etc.: the log ends there. 

Comment 15 IBM Bug Proxy 2007-08-01 23:00:12 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-01 18:57 EDT -------
Possibly related and triggered on a 2.6.21.4-rt10 based kernel running iperf:

BUG: at mm/slab.c:2795 slab_put_obj()

Call Trace:
 [<ffffffff810af10a>] free_block+0x14a/0x220
 [<ffffffff810af2c9>] drain_array+0xe9/0x150
 [<ffffffff810af44a>] __drain_alien_cache+0x8a/0xa0
 [<ffffffff810b0a5d>] cache_reap+0x12d/0x2c0
 [<ffffffff810b0930>] cache_reap+0x0/0x2c0
 [<ffffffff810567d1>] run_workqueue+0xc1/0x120
 [<ffffffff8105784b>] worker_thread+0x15b/0x1a0
 [<ffffffff8103bd50>] default_wake_function+0x0/0x20
 [<ffffffff81037586>] __wake_up_common+0x46/0x80
 [<ffffffff810576f0>] worker_thread+0x0/0x1a0
 [<ffffffff8105a5f9>] kthread+0x129/0x160
 [<ffffffff8100b538>] child_rip+0xa/0x12
 [<ffffffff8105a4d0>] kthread+0x0/0x160
 [<ffffffff8100b52e>] child_rip+0x0/0x12

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
---------------------------------------- 

Comment 16 IBM Bug Proxy 2007-08-03 20:51:15 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-03 16:48 EDT -------
I'm still hammering on this, but so far have not narrowed it down much.

I have found that running "iperf -c <client> -t 999999 -P 4" helps reproduce the
issue much faster. 

Comment 17 IBM Bug Proxy 2007-08-04 01:20:28 UTC
Created attachment 160683 [details]
out

Comment 18 IBM Bug Proxy 2007-08-04 01:20:43 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-03 21:17 EDT -------
 
dmesg and sysrq-t output

I got into a situation where the box hung, then after a bit the box recovered
(this happens 1 out of every 10 times or so), however I also ran sysrq-t while
the box was hung and printing out "tg3: eth0: transmit timed out, resetting"
messages.

This is the resulting dmesg output. 

Comment 19 IBM Bug Proxy 2007-08-07 00:00:26 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-06 19:57 EDT -------
Tried to get a better grip on the issue, bisected down to the following:

2.6.21.4-rt10: Does hang.
2.6.21.4-rt11: Does hang.
2.6.21.5-rt14: Does hang.
2.6.21.5-rt15: Does hang.
2.6.21.5-rt16: NA (doesn't build)
2.6.21.5-rt17: Does not hang.
2.6.21.6-rt19: Does not hang.

I'm working to narrow down the -15 to -17 diff. 

Comment 20 IBM Bug Proxy 2007-08-07 02:55:28 UTC
Created attachment 160793 [details]
mini17.patch

Comment 21 IBM Bug Proxy 2007-08-07 02:55:34 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-06 22:52 EDT -------
 
Steven's restrict irqd from migrating

Apparent fix for the issue. I'm still testing, but with this patch,
2.6.21.4-rt10 appears not to hang w/ iperf stress testing.

The issue of course, is that this binds irqd's to one cpu of the affinity mask
so they do not migrate, which makes it possible that an rt task can starve the
interrupt, should it land on the one selected.	I've sent previous patches to
Ingo w/ basically reverting this patch. :(

Talking w/ Steven, he suggested we look to stop softirqs from being called from
irqds. I'm still mulling over my own thoughts as to what the right fix would
be. 

Comment 22 IBM Bug Proxy 2007-08-08 02:05:29 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_ALREADY_AVAIL




------- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-08-07 22:00 EDT -------
I'm going to mark this as fixed, however its likely we'll need to open a new bug
to track the softirq migration vs starvation issue. 

Comment 23 IBM Bug Proxy 2007-08-10 20:15:42 UTC
Created attachment 161080 [details]
Steven&apos;s recent patch against -31

Patch against RH's -31 kernel from Steven to avoid calling softirqs directly
from irqds. This resolves the network hang as well as the previous patch and
avoids irq starvation.

Comment 24 IBM Bug Proxy 2007-08-13 19:25:59 UTC
------- Comment From jstultz.com 2007-08-13 15:21 EDT-------
So with the last patch attached, I had the box fall over w/ heavy stress
testing. Not sure if this is starvation caused or not. Continuing testing and
will provide more details when I can.

Comment 25 IBM Bug Proxy 2007-08-14 00:15:40 UTC
------- Comment From jstultz.com 2007-08-13 20:13 EDT-------
Verified as starvation issue. Steven's patch still improves on -31, so should
likely go in, but it is still possible w/ a single rt-hog to lock up the box
(after some time).

Comment 26 Steven Rostedt 2007-08-24 14:48:39 UTC
The RT starvation issue is due to the RT hogs preempting a IRQ thread that is
bounded to a CPU. If the IRQ affinity mask is set to more than one CPU, the
expected behaviour (from the user point of view) is that the IRQ thread would
migrate over to other CPUS.  But unfortunately, if we bind the IRQ thread, this
prevents the migration from happening, and thus we get an IRQ starvation issue.

The fix is to simply disable the Hard irq thread running any softirq routines,
until we can make it possible for the softirq routines to migrate themselves. We
can still starve softirqs (and possibly more than one if the use of locks is
applied) but that's a separate issue.

I'll attach a patch that disables the IRQ thread from running softirq routines
and not to need to be bounded to any less amount of CPUS than what the IRQ
affinity stated.

Comment 27 Steven Rostedt 2007-08-24 14:50:01 UTC
Created attachment 172422 [details]
Don't run the softirq routine from the hard IRQ thread

Comment 28 Clark Williams 2007-10-19 15:24:47 UTC
Has this bug been seen since the patch listed in #27 was applied (2.6.21-36)?


Comment 29 IBM Bug Proxy 2008-03-25 23:32:54 UTC
------- Comment From jstultz.com 2008-03-25 19:29 EDT-------
This fix has been included in RH's tree, so it can be marked accepted.


Note You need to log in before you can comment on or make changes to this bug.