Bug 507520

Summary: xen kernel, modprobe -r popup call trace and error msg
Product: Red Hat Enterprise Linux 5 Reporter: Peng ZhenFei <zhenfei.peng>
Component: kernel-xenAssignee: Chris Lalancette <clalance>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: adaora.onyia, bill.hayes, clalance, dchapman, ddutile, dzickus, emcnabb, martine.silbermann, rick.hester, sghosh, shengliang.lv, tao, xen-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:55:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sysreport
none
Skip calling PHYSDEVOP_manage_pci_remove on PCI teardown for ia64 none

Description Peng ZhenFei 2009-06-23 06:15:25 UTC
Description of problem:

Install ia64 RHEL5.4 alpha with Xen kernel. Use modprobe -r to remove some pci device driver. System will throw out Error messange and call trace.


[root@maxcv ~]# modprobe -r e1000e
BUG: warning at drivers/xen/core/pci.c:41/pci_bus_remove_wrapper() (Tainted: G     )

Call Trace:
 [<a00000010001d240>] show_stack+0x40/0xa0
                                sp=e00000019016fbf0 bsp=e000000190169298
 [<a00000010001d2d0>] dump_stack+0x30/0x60
                                sp=e00000019016fdc0 bsp=e000000190169280
 [<a000000100415540>] pci_bus_remove_wrapper+0x120/0x140
                                sp=e00000019016fdc0 bsp=e000000190169260
 [<a000000100400da0>] __device_release_driver+0x160/0x1c0
                                sp=e00000019016fdd0 bsp=e000000190169228
 [<a0000001004015f0>] driver_detach+0x170/0x200
                                sp=e00000019016fdd0 bsp=e0000001901691f0
 [<a0000001003ff5e0>] bus_remove_driver+0x120/0x180
                                sp=e00000019016fdd0 bsp=e0000001901691c0
 [<a000000100401700>] driver_unregister+0x20/0x60
                                sp=e00000019016fdd0 bsp=e0000001901691a0
 [<a0000001003058b0>] pci_unregister_driver+0x50/0x120
                                sp=e00000019016fdd0 bsp=e000000190169170
 [<a0000002015a5850>] e1000_exit_module+0x30/0x1530 [e1000e]
                                sp=e00000019016fdd0 bsp=e000000190169158
 [<a0000001000da3c0>] sys_delete_module+0x3c0/0x460
                                sp=e00000019016fdd0 bsp=e0000001901690e8
 [<a00000010006ae00>] xen_trace_syscall+0x100/0x140
                                sp=e00000019016fe30 bsp=e0000001901690e8
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000190170000 bsp=e0000001901690e8



Version-Release number of selected component (if applicable):

xen-3.0.3-87.el5
kernel-xen-2.6.18-152.el5
Linux maxcv.rx3600-11.test 2.6.18-152.el5xen #1 SMP Wed Jun 3 19:21:01 EDT 2009 ia64 ia64 ia64 GNU/Linux

How reproducible:
always

Steps to Reproduce:
1.modprobe -r device-driver-name
2.
3.
  
Actual results:
popup error msg and call trace

Expected results:
no popup msg

Additional info:

Comment 2 Peng ZhenFei 2009-06-24 06:09:42 UTC
if we do the following:
1) modprobe -r device-drivers (e1000e)
2) when modprobe -r finished , then "modprobe e1000e "really quick. we will find

[root@maxcv ~]# modprobe  e1000e
map irq failed
0000:52:00.0: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:52:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:8b:00.0: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:8b:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.

Comment 3 Peng ZhenFei 2009-06-24 06:11:21 UTC
Created attachment 349198 [details]
sysreport

Comment 4 Chris Lalancette 2009-07-09 11:46:50 UTC
Can you try this same test on the 5.3 (that would be kernel-xen-2.6.18-128.el5) and report back if this is a regression or not?

Thanks,
Chris Lalancette

Comment 5 Peng ZhenFei 2009-07-10 01:46:46 UTC
We reproduce this on all our IA64 hardware and we dont see this happen on rhel5.3

we also found this on rhel5.4b1

Comment 6 Chris Lalancette 2009-07-10 07:35:59 UTC
OK, thanks for confirming.  I'll mark this as a regression then.

Chris Lalancette

Comment 8 Chris Lalancette 2009-07-10 10:08:36 UTC
Gah.  I see what the problem is now.

When we added the VT-d stuff, we added some code in drivers/xen/core/pci.c (where the bug message is coming from) that looks like this:

static int pci_bus_remove_wrapper(struct device *dev)
{
	int r;
	struct pci_dev *pci_dev = to_pci_dev(dev);
	struct physdev_manage_pci manage_pci;
	manage_pci.bus = pci_dev->bus->number;
	manage_pci.devfn = pci_dev->devfn;

	r = pci_bus_remove(dev);
	/* dev and pci_dev are no longer valid!! */

	WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove,
		&manage_pci));
	return r;
}

However, our ia64 currently doesn't implement PHYSDEVOP_manage_pci_remove, so that's what causes the error message.  Upstream xen-unstable c/s 18686 does implement this, so we'll probably need to backport that.

Chris Lalancette

Comment 9 Chris Lalancette 2009-07-10 11:33:36 UTC
Created attachment 351247 [details]
Skip calling PHYSDEVOP_manage_pci_remove on PCI teardown for ia64

(In reply to comment #8)
> However, our ia64 currently doesn't implement PHYSDEVOP_manage_pci_remove, so
> that's what causes the error message.  Upstream xen-unstable c/s 18686 does
> implement this, so we'll probably need to backport that.

Nix this last part.  That requires pulling in basically all of ia64 VT-d support, which is way too risky at this stage in 5.4.  Instead, I've tested the attached patch, which seems to fix the issue for me.

Chris Lalancette

Comment 11 Don Zickus 2009-07-21 19:36:52 UTC
in kernel-2.6.18-159.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 13 Peng ZhenFei 2009-07-24 07:14:41 UTC
the new kernel looks good . works for me

Comment 16 errata-xmlrpc 2009-09-02 08:55:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html