Bug 507520 - xen kernel, modprobe -r popup call trace and error msg
xen kernel, modprobe -r popup call trace and error msg
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.4
ia64 Linux
low Severity medium
: rc
: ---
Assigned To: Chris Lalancette
Red Hat Kernel QE team
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-23 02:15 EDT by Peng ZhenFei
Modified: 2010-10-23 06:18 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:55:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysreport (978.02 KB, application/octet-stream)
2009-06-24 02:11 EDT, Peng ZhenFei
no flags Details
Skip calling PHYSDEVOP_manage_pci_remove on PCI teardown for ia64 (419 bytes, patch)
2009-07-10 07:33 EDT, Chris Lalancette
no flags Details | Diff

  None (edit)
Description Peng ZhenFei 2009-06-23 02:15:25 EDT
Description of problem:

Install ia64 RHEL5.4 alpha with Xen kernel. Use modprobe -r to remove some pci device driver. System will throw out Error messange and call trace.


[root@maxcv ~]# modprobe -r e1000e
BUG: warning at drivers/xen/core/pci.c:41/pci_bus_remove_wrapper() (Tainted: G     )

Call Trace:
 [<a00000010001d240>] show_stack+0x40/0xa0
                                sp=e00000019016fbf0 bsp=e000000190169298
 [<a00000010001d2d0>] dump_stack+0x30/0x60
                                sp=e00000019016fdc0 bsp=e000000190169280
 [<a000000100415540>] pci_bus_remove_wrapper+0x120/0x140
                                sp=e00000019016fdc0 bsp=e000000190169260
 [<a000000100400da0>] __device_release_driver+0x160/0x1c0
                                sp=e00000019016fdd0 bsp=e000000190169228
 [<a0000001004015f0>] driver_detach+0x170/0x200
                                sp=e00000019016fdd0 bsp=e0000001901691f0
 [<a0000001003ff5e0>] bus_remove_driver+0x120/0x180
                                sp=e00000019016fdd0 bsp=e0000001901691c0
 [<a000000100401700>] driver_unregister+0x20/0x60
                                sp=e00000019016fdd0 bsp=e0000001901691a0
 [<a0000001003058b0>] pci_unregister_driver+0x50/0x120
                                sp=e00000019016fdd0 bsp=e000000190169170
 [<a0000002015a5850>] e1000_exit_module+0x30/0x1530 [e1000e]
                                sp=e00000019016fdd0 bsp=e000000190169158
 [<a0000001000da3c0>] sys_delete_module+0x3c0/0x460
                                sp=e00000019016fdd0 bsp=e0000001901690e8
 [<a00000010006ae00>] xen_trace_syscall+0x100/0x140
                                sp=e00000019016fe30 bsp=e0000001901690e8
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000190170000 bsp=e0000001901690e8



Version-Release number of selected component (if applicable):

xen-3.0.3-87.el5
kernel-xen-2.6.18-152.el5
Linux maxcv.rx3600-11.test 2.6.18-152.el5xen #1 SMP Wed Jun 3 19:21:01 EDT 2009 ia64 ia64 ia64 GNU/Linux

How reproducible:
always

Steps to Reproduce:
1.modprobe -r device-driver-name
2.
3.
  
Actual results:
popup error msg and call trace

Expected results:
no popup msg

Additional info:
Comment 2 Peng ZhenFei 2009-06-24 02:09:42 EDT
if we do the following:
1) modprobe -r device-drivers (e1000e)
2) when modprobe -r finished , then "modprobe e1000e "really quick. we will find

[root@maxcv ~]# modprobe  e1000e
map irq failed
0000:52:00.0: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:52:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:8b:00.0: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
map irq failed
0000:8b:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Comment 3 Peng ZhenFei 2009-06-24 02:11:21 EDT
Created attachment 349198 [details]
sysreport
Comment 4 Chris Lalancette 2009-07-09 07:46:50 EDT
Can you try this same test on the 5.3 (that would be kernel-xen-2.6.18-128.el5) and report back if this is a regression or not?

Thanks,
Chris Lalancette
Comment 5 Peng ZhenFei 2009-07-09 21:46:46 EDT
We reproduce this on all our IA64 hardware and we dont see this happen on rhel5.3

we also found this on rhel5.4b1
Comment 6 Chris Lalancette 2009-07-10 03:35:59 EDT
OK, thanks for confirming.  I'll mark this as a regression then.

Chris Lalancette
Comment 8 Chris Lalancette 2009-07-10 06:08:36 EDT
Gah.  I see what the problem is now.

When we added the VT-d stuff, we added some code in drivers/xen/core/pci.c (where the bug message is coming from) that looks like this:

static int pci_bus_remove_wrapper(struct device *dev)
{
	int r;
	struct pci_dev *pci_dev = to_pci_dev(dev);
	struct physdev_manage_pci manage_pci;
	manage_pci.bus = pci_dev->bus->number;
	manage_pci.devfn = pci_dev->devfn;

	r = pci_bus_remove(dev);
	/* dev and pci_dev are no longer valid!! */

	WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove,
		&manage_pci));
	return r;
}

However, our ia64 currently doesn't implement PHYSDEVOP_manage_pci_remove, so that's what causes the error message.  Upstream xen-unstable c/s 18686 does implement this, so we'll probably need to backport that.

Chris Lalancette
Comment 9 Chris Lalancette 2009-07-10 07:33:36 EDT
Created attachment 351247 [details]
Skip calling PHYSDEVOP_manage_pci_remove on PCI teardown for ia64

(In reply to comment #8)
> However, our ia64 currently doesn't implement PHYSDEVOP_manage_pci_remove, so
> that's what causes the error message.  Upstream xen-unstable c/s 18686 does
> implement this, so we'll probably need to backport that.

Nix this last part.  That requires pulling in basically all of ia64 VT-d support, which is way too risky at this stage in 5.4.  Instead, I've tested the attached patch, which seems to fix the issue for me.

Chris Lalancette
Comment 11 Don Zickus 2009-07-21 15:36:52 EDT
in kernel-2.6.18-159.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 13 Peng ZhenFei 2009-07-24 03:14:41 EDT
the new kernel looks good . works for me
Comment 16 errata-xmlrpc 2009-09-02 04:55:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.