586352 – oops in cnic when running on xen and disable_lro

Bug 586352 - oops in cnic when running on xen and disable_lro

Summary: oops in cnic when running on xen and disable_lro

Keywords:
Status:	CLOSED DUPLICATE of bug 582367
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Stanislaw Gruszka
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	595548
TreeView+	depends on / blocked

Reported:	2010-04-27 11:37 UTC by Stanislaw Gruszka
Modified:	2010-08-02 09:24 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-08-02 09:24:56 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
proposed workaround/fix (717 bytes, text/plain) 2010-04-27 11:45 UTC, Stanislaw Gruszka	no flags	Details
packed addional patches (6.34 KB, application/x-bzip2) 2010-04-27 11:49 UTC, Stanislaw Gruszka	no flags	Details
View All

Description Stanislaw Gruszka 2010-04-27 11:37:51 UTC

Description of problem:
I'm adding patches to automatic disable LRO on bnx2x. With that new patches cnic oops on xen kernel when I'm enabling bridges.

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff8840d50f>] :uio:uio_event_notify+0x1/0x31
PGD 157c4b067 PUD 15395e067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /class/net/lo/ifindex
CPU 0
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat bridge autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi ac parport_pc lp parport sg shpchp hpilo pcspkr bnx2x 8021q serial_core serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 14, comm: events/0 Not tainted 2.6.18-194.el5.bnx2x_v3xen #1
RIP: e030:[<ffffffff8840d50f>]  [<ffffffff8840d50f>] :uio:uio_event_notify+0x1/0x31
RSP: e02b:ffffffff80684e88  EFLAGS: 00010297
RAX: ffff880151c05191 RBX: ffff8801590fd0a8 RCX: 0000000000000000
RDX: ffff88015b135191 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88015a990500 R08: ffff88015e44e000 R09: 0000000000000001
R10: ffff88015a990500 R11: 00000000000000c8 R12: 0000000000000001
R13: 0000000000000001 R14: ffff88015e44fcf8 R15: ffff88015e44fcf8
FS:  00002b22f5712260(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process events/0 (pid: 14, threadinfo ffff88015e44e000, task ffff88015e44d7a0)
Stack:  ffff8801590fd0a8  ffffffff884a301f  0000000000000002  ffffffff881cdcba
 ffffffff80684ea0  0000000000000000  ffff88015ba62c40  0000000000000015
 0000000000000000  ffffffff8021152a
Call Trace:
 <IRQ>  [<ffffffff884a301f>] :cnic:cnic_service_bnx2x+0x69/0x6d
 [<ffffffff881cdcba>] :bnx2x:bnx2x_interrupt+0x19a/0x211
 [<ffffffff8021152a>] handle_IRQ_event+0x55/0xae
 [<ffffffff802b3642>] __do_IRQ+0xa4/0x103
 [<ffffffff80290528>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026df62>] do_IRQ+0xe7/0xf5
 [<ffffffff803b3b8f>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff881c7116>] :bnx2x:bnx2x_release_hw_lock+0x8d/0xe0
 [<ffffffff881c717d>] :bnx2x:bnx2x_release_phy_lock+0x14/0x21
 [<ffffffff881d7d0a>] :bnx2x:bnx2x_nic_load+0x108d/0x1371
 [<ffffffff80299a8a>] queue_delayed_work+0x75/0x7e
 [<ffffffff881d0220>] :bnx2x:bnx2x_nic_unload+0x836/0x844
 [<ffffffff881d9f9f>] :bnx2x:bnx2x_reset_task+0x0/0x31
 [<ffffffff881d9fca>] :bnx2x:bnx2x_reset_task+0x2b/0x31
 [<ffffffff8024fa5f>] run_workqueue+0x94/0xe4
 [<ffffffff8024c318>] worker_thread+0x0/0x122
 [<ffffffff8024c408>] worker_thread+0xf0/0x122
 [<ffffffff8028906a>] default_wake_function+0x0/0xe
 [<ffffffff80233e47>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff80233d49>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12


Version-Release number of selected component (if applicable):
2.6.18-194.el5 + new bnx2x patches.

How reproducible:
Always

Steps to Reproduce:
1. Boot in xen domain 0 with disabled libvirtd and xend
2. Run /etc/init.d/libvirtd start

Comment 1 Stanislaw Gruszka 2010-04-27 11:45:42 UTC

Created attachment 409446 [details]
proposed workaround/fix

This patch prevent oops, I'm pretty sure it is not right fix just workaround.

Comment 2 Stanislaw Gruszka 2010-04-27 11:49:37 UTC

Created attachment 409449 [details]
packed addional patches

Patches for 2.6.18-194.el5 kernel which make this bug reproducible.

Comment 3 Stanislaw Gruszka 2010-04-27 11:58:36 UTC

Michael,

Problem is that we get interrupt when cnic driver is not ready for it. On xen we have legacy INT# interrupt. This happens when we reset device during bnx2x_nic_load(). 

Do you have any ideas about better fix?

Comment 4 Michael Chan 2010-04-27 12:30:01 UTC

I think this upstream patch should fix it.

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=94824f3dbe0d3f62470603bbb18efb5510aaf07c

We saw a similar issue during MTU change and fixed it with the above patch.  Thanks.

Comment 5 Stanislaw Gruszka 2010-04-27 13:56:06 UTC

Yes, patch fix problem. Thanks Michael.

Comment 6 Stanislaw Gruszka 2010-08-02 09:24:56 UTC

This patch was applied with patch series for bug 582367 .

*** This bug has been marked as a duplicate of bug 582367 ***

Note You need to log in before you can comment on or make changes to this bug.