Bug 236149 - [RHEL5 RT] BUG: Unable to handle kernel paging request at 000041592fc43ad0
Summary: [RHEL5 RT] BUG: Unable to handle kernel paging request at 000041592fc43ad0
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Arnaldo Carvalho de Melo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-12 03:38 UTC by IBM Bug Proxy
Modified: 2008-02-27 19:56 UTC (History)
2 users (show)

Fixed In Version: -rt6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-06-01 20:03:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 33449 0 None None None Never

Description IBM Bug Proxy 2007-04-12 03:38:09 UTC
LTC Owner is: jstultz.com
LTC Originator is: jstultz.com


Booted RHEL5-rt w/ /etc/selinux/config set w/ SELINUX=disabled and got the
following Oops:

Unable to handle kernel paging request at 000041592fc43ad0 RI
P:                                                                             
                   
 [<ffffffff802dff66>] free_block+0xb7/0x164
PGD 0 
Oops: 0002 [1] PREEMPT SMP 
CPU 2 
Modules linked in: nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state
iptable_filter ip_
tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video sbs
i2c_ec i2c_core doc
k button battery asus_acpi backlight ac parport_pc lp parport sg pcspkr shpchp
k8temp hwmon bnx2 se
rio_raw dm_snapshot dm_zero dm_mirror dm_mod usb_storage mptsas
scsi_transport_sas mptscsih mptbase sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd                                               
Pid: 56, comm: events/2 Not tainted 2.6.20-0119.rt8 #1
RIP: 0010:[<ffffffff802dff66>]  [<ffffffff802dff66>] free_block+0xb7/0x164
RSP: 0018:ffff81022ff6dd30  EFLAGS: 00010282
RAX: 000041592fc43ad0 RBX: ffff81012fc45280 RCX: ffff81012fc43ac0
RDX: ffff81012eb25040 RSI: ffff81012ee0f000 RDI: ffff81012ee0f980
RBP: ffff81022ff6dd60 R08: ffff81022ff6dd84 R09: 0000000000000003
R10: 0000000000000003 R11: 00000000000007cd R12: ffff81012fce5850
R13: 0000000000000001 R14: ffff81022ff6dd84 R15: 0000000000000000
FS:  00002b3fc413c6f0(0000) GS:ffff81012fc85ac0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000041592fc43ad0 CR3: 000000012f2ab000 CR4: 00000000000006e0
Process events/2 (pid: 56, threadinfo ffff81022ff6c000, task ffff81012fa720c0)
Stack:  0000000b0000000b ffff81012fce5848 000000000000000b ffff81012fce5800
 ffff81012fc43b00 0000000000000000 ffff81022ff6ddb0 ffffffff802e0266
 ffff81012fc45280 ffffffff802e0083 000000022ff6ddd4 0000000000000045
Call Trace:
 [<ffffffff802e0266>] drain_array+0xd6/0x132
 [<ffffffff802e0544>] cache_reap+0x10d/0x2a0
 [<ffffffff8024f4e0>] run_workqueue+0x9f/0xf9
 [<ffffffff8024be44>] worker_thread+0x11b/0x152
 [<ffffffff8023405e>] kthread+0xee/0x11a
 [<ffffffff802600f8>] child_rip+0xa/0x12

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff8026657b>] .... __spin_trylock+0x13/0x4f
.....[<ffffffff80267111>] ..   ( <= oops_begin+0x23/0x6f)


Code: 48 89 10 89 f8 48 c7 06 00 01 10 00 48 c7 46 08 00 02 20 00 
RIP  [<ffffffff802dff66>] free_block+0xb7/0x164
 RSP <ffff81022ff6dd30>
CR2: 000041592fc43ad0

The issue did not reproduce on the following boot.

I still see this occasionally. I mailed the log to Ingo.

Comment 1 Tim Burke 2007-04-13 19:59:52 UTC
Do you only see this when selinux is disabled?

None of the guys I have checked with have seen this bug, but we tend to run with
selinux enabled.



Comment 2 Tim Burke 2007-04-13 20:01:15 UTC
What hardware configuration?

Comment 3 IBM Bug Proxy 2007-04-13 20:15:53 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-04-13 16:10 EDT -------
I believe this was seen with and without SElinux, however I'll try to find some
time to reproduce it. It only happens every so often, so its hard to reproduce.

How much hardware detail do you need? Its an LS21, 2 dualcore cpus. 

Comment 4 Jeff Burke 2007-04-13 21:16:14 UTC
John,
 [John Asked] How much hardware detail do you need? Its an LS21, 2 dualcore cpus.
  That should be enough for now. Just wanted to make sure we had the same
hardware  in house.

 I will see if I can reproduce this on the system we have here.

Jeff

Comment 5 IBM Bug Proxy 2007-05-02 12:05:24 UTC
----- Additional Comments From ankigarg.com (prefers email at ankita.com)  2007-05-02 08:00 EDT -------
In all the testing on RHEL5 RT, I have not hit this Oops. 

Comment 6 IBM Bug Proxy 2007-05-10 22:10:51 UTC
----- Additional Comments From dvhltc.com  2007-05-10 18:05 EDT -------
I just saw what appears to be the same bug on an LS41 (pretty much two LS21's
bolted together), 4 dualcore Opterons.


INIT: Entering runlevel: 3
Entering non-interactive startup
Starting background readahead: [  OK  ]
Checking for hardware changes [  OK  ]
Applying ip6tables firewall rules: [  OK  ]
Applying iptables firewall rules: [  OK  ]
Loading additional iptables modules: ip_conntrack_netbios_ns [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:  Unable to handle kernel paging request at
000041613df82000 RIP: 
 [<ffffffff802dff66>] free_block+0xb7/0x164
PGD 0 
Oops: 0002 [1] PREEMPT SMP 
CPU 4 
Modules linked in: nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables ipv6 dm_mirror dm_mod video sbs i2c_ec i2c_core dock button battery
asus_acpi backlight ac parport_pc lp parport sr_mod cdrom sg pcspkr k8temp hwmon
shpchp bnx2 serio_raw usb_storage mptsas scsi_transport_sas mptscsih mptbase
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 110, comm: events/4 Not tainted 2.6.20-0119.rt8 #1
RIP: 0010:[<ffffffff802dff66>]  [<ffffffff802dff66>] free_block+0xb7/0x164
RSP: 0018:ffff81013fbe3d30  EFLAGS: 00010282
RAX: 000041613df82000 RBX: ffff81013fc4b300 RCX: ffff81013fc432c0
RDX: ffff81013e9d7000 RSI: ffff81013eeff000 RDI: ffff81013eeff660
RBP: ffff81013fbe3d60 R08: ffff81013fbe3d84 R09: 0000000000000003
R10: 0000000000000003 R11: 00000000000000ea R12: ffff81013fcb2048
R13: 0000000000000000 R14: ffff81013fbe3d84 R15: 0000000000000000
FS:  00002b0301ffdb00(0000) GS:ffff81013fc856c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000041613df82000 CR3: 0000000000201000 CR4: 00000000000006e0
Process events/4 (pid: 110, threadinfo ffff81013fbe2000, task ffff8101461a40c0)
Stack:  0000000600000018 ffff81013fcb2048 0000000000000006 ffff81013fcb2000
 ffff81013fc43300 0000000000000000 ffff81013fbe3db0 ffffffff802e0266
 ffff81013fc4b300 ffff810146258af0 000000048f725a4b 0000000000000032
Call Trace:
 [<ffffffff802e0266>] drain_array+0xd6/0x132
 [<ffffffff802e0544>] cache_reap+0x10d/0x2a0
 [<ffffffff8024f4e0>] run_workqueue+0x9f/0xf9
 [<ffffffff8024be44>] worker_thread+0x11b/0x152
 [<ffffffff8023405e>] kthread+0xee/0x11a
 [<ffffffff802600f8>] child_rip+0xa/0x12

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff8026657b>] .... __spin_trylock+0x13/0x4f
.....[<ffffffff80267111>] ..   ( <= oops_begin+0x23/0x6f)


Code: 48 89 10 89 f8 48 c7 06 00 01 10 00 48 c7 46 08 00 02 20 00 
RIP  [<ffffffff802dff66>] free_block+0xb7/0x164
 RSP <ffff81013fbe3d30>
CR2: 000041613df82000
 <7>eth1: no IPv6 routers present 

Comment 7 IBM Bug Proxy 2007-05-10 22:11:23 UTC
------- Additional Comments From dvhltc.com  2007-05-10 18:05 EDT -------
Comment 12(In reply to comment #12)
> I just saw what appears to be the same bug on an LS41 (pretty much two LS21's
> bolted together), 4 dualcore Opterons.

This was with SELINUX=enforcing 

Comment 8 Arnaldo Carvalho de Melo 2007-05-23 14:41:28 UTC
Questions:

1) Which kernel was used? 2.6.20-rt or 2.6.21-rt? Which exact RPM release?
2) From the logs it seems bnx2 is the only networking interface driver present,
can you please confirm this?

My current impression is that the answer for #1 is 2.6.20-rt and that the
problem is fixed by this changeset:

-----
commit 1b2f922f6869eb13dadfe1ba3f8337bd42e50a2e
Author: Michael Chan <mchan>
Date:   Thu May 3 13:20:19 2007 -0700

    [BNX2]: Fix race conditions when calling register_netdev().

    Hot-plug scripts can call bnx2_open() as soon as register_netdev() is
    called in bnx2_init_one().  We need to call pci_set_drvdata() and
    setup everything before calling register_netdev(). netif_carrier_off()
    also needs to be moved to bnx2_open() to avoid race conditions with
    the irq.

    Signed-off-by: Michael Chan <mchan>
    Signed-off-by: David S. Miller <davem>
-----

As it is fairly recent it is possible that even with 2.6.21-rt based kernels we
would ocasionaly get  this race condition triggering. I will check if the latest
2.6.21-rt RHEL-RT rpm has this fix.

Comment 9 Arnaldo Carvalho de Melo 2007-05-23 14:52:31 UTC
I checked and this changeset was commited after 2.6.21 was released. We'll have
to do a backport.

Comment 10 IBM Bug Proxy 2007-05-23 18:51:09 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-05-23 14:49 EDT -------
Arnaldo: Looking at the trace, I'm not sure i see how register_netdev is connected.

Further, so far I've not seen this issue w/ RH's 2.6.21 based kernels. I'd like
to do a bit more testing, but I suspect this can closed soon if the issue
doesn't reappear. 

Comment 11 Arnaldo Carvalho de Melo 2007-05-23 19:54:28 UTC
Loading additional iptables modules: ip_conntrack_netbios_ns [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:  Unable to handle kernel paging request at
000041613df82000 RIP: 
 [<ffffffff802dff66>] free_block+0xb7/0x164
PGD 0 
Oops: 0002 [1] PREEMPT SMP 

Before the trace, problem seems to happen when eth1 interface is being brought
up.  Now let me do what I should have done in the first place: look at the
bnx2.c code to see what of problems the race fixed by the patch mentioned could
cause.

Comment 12 Arnaldo Carvalho de Melo 2007-06-01 00:06:43 UTC
John, has the issue reappeared?

Comment 13 IBM Bug Proxy 2007-06-01 00:20:20 UTC
----- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-05-31 20:17 EDT -------
No, so far this issue has not been seen. 

Comment 14 Tim Burke 2007-06-01 20:03:40 UTC
setting to closed/currentrelease.  reopen if problem reappears.

Comment 15 IBM Bug Proxy 2007-06-19 23:16:39 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|OPEN                        |REJECTED
         Resolution|                            |UNREPRODUCIBLE




------- Additional Comments From jstultz.com (prefers email at johnstul.com)  2007-06-19 19:13 EDT -------
Hasn't been seen in a long while. Marking unreproducible. Please reopen if you
see this again. 


Note You need to log in before you can comment on or make changes to this bug.