Bug 972381

Summary: kernel panic when attach device to pcie switch
Product: Red Hat Enterprise Linux 7 Reporter: Suqin Huang <shuang>
Component: qemu-kvmAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, chayang, juzhang, qiguo, rkrcmar, shuang, sluo, virt-maint, xfu, zhzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-26 14:21:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suqin Huang 2013-06-09 03:16:30 UTC
Description of problem:
kernel panic when boot guest with device attached to switch

Version-Release number of selected component (if applicable):
qemu-kvm-1.5.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot guest with cmd:

/usr/libexec/qemu-kvm -M q35 -monitor stdio -vnc :0 \
-drive file=/root/RHEL-Server-7.0-64-virtio.qcow2,id=disk1,if=none,format=qcow2,media=disk,cache=none \
-device virtio-blk-pci,bus=pcie.0,id=virtio-disk1,addr=0x4,drive=disk1 \
-chardev socket,id=serial_info,path=/tmp/serial-rhel7,server,nowait \
-device isa-serial,chardev=serial_info \
-device x3130-upstream,bus=pcie.0,id=upstream,addr=0x5 \
-device xio3130-downstream,bus=upstream,id=downstream0,chassis=1 \
-device nec-usb-xhci,bus=downstream0,id=usb_controller \
-drive file=/root/usb-s.qcow2,if=none,format=qcow2,media=disk,id=usb_disk \
-device usb-storage,drive=usb_disk,id=usb_d,bus=usb_controller.0 \
-device virtio-net-pci,netdev=idmbEdhe,mac=9a:20:d8:63:50:40,id=ndev00idmbEdhe,bus=pcie.0,addr=0x3  \
-netdev tap,id=idmbEdhe,vhost=on,script=/etc/qemu-ifup -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu SandyBridge -vga std -rtc base=utc,clock=host,driftfix=slew  -boot order=cdn,once=d,menu=off -no-kvm-pit-reinjection -no-shutdown -enable-kvm


2.
3.

Actual results:


Expected results:


Additional info:

serial info:

[    0.078750] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[    0.079000] IP: [<ffffffff8131dbfa>] pcie_aspm_init_link_state+0x30a/0x7b0
[    0.079000] PGD 0 
[    0.079000] Oops: 0000 [#1] SMP 
[    0.079000] Modules linked in:
[    0.079000] CPU 0 
[    0.079000] Pid: 1, comm: swapper/0 Not tainted 3.7.0-0.36.el7.x86_64 #1 Bochs Bochs
[    0.079000] RIP: 0010:[<ffffffff8131dbfa>]  [<ffffffff8131dbfa>] pcie_aspm_init_link_state+0x30a/0x7b0
[    0.079000] RSP: 0000:ffff880005d25928  EFLAGS: 00010246
[    0.079000] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    0.079000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880005f0ecf8
[    0.079000] RBP: ffff880005d259b8 R08: 0000000000016de0 R09: ffff880005f0ecc0
[    0.079000] R10: 0000000000000000 R11: 00000000000000c9 R12: ffff880005f0ecc0
[    0.079000] R13: ffff880005f20000 R14: ffff880005f0ecd8 R15: 0000000000000000
[    0.079000] FS:  0000000000000000(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
[    0.079000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.079000] CR2: 0000000000000088 CR3: 00000000018c3000 CR4: 00000000000006f0
[    0.079000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.079000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.079000] Process swapper/0 (pid: 1, threadinfo ffff880005d24000, task ffff880005d80000)
[    0.079000] Stack:
[    0.079000]  ffff880005d25958 ffff880005f22000 ffff880005f28000 ffff880005f22000
[    0.079000]  ffff880005f28000 0000000000000000 ffff880005d25978 ffffffff81310143
[    0.079000]  ffff880005f22000 ffff880005f28000 ffff880005d259b8 ffffffff815cd287
[    0.079000] Call Trace:
[    0.079000]  [<ffffffff81310143>] ? pci_device_add+0xf3/0x100
[    0.079000]  [<ffffffff815cd287>] ? pci_scan_single_device+0xa7/0xc0
[    0.079000]  [<ffffffff8130efb0>] ? next_trad_fn+0x20/0x20
[    0.079000]  [<ffffffff81310295>] pci_scan_slot+0x145/0x160
[    0.079000]  [<ffffffff815cfe90>] pci_scan_child_bus+0x4d/0x123
[    0.079000]  [<ffffffff815cfadf>] pci_scan_bridge+0x1c1/0x525
[    0.079000]  [<ffffffff815cff1a>] pci_scan_child_bus+0xd7/0x123
[    0.079000]  [<ffffffff815cfadf>] pci_scan_bridge+0x1c1/0x525
[    0.079000]  [<ffffffff815cd244>] ? pci_scan_single_device+0x64/0xc0
[    0.079000]  [<ffffffff81310776>] ? pci_create_root_bus+0x326/0x3f0
[    0.079000]  [<ffffffff815cff1a>] pci_scan_child_bus+0xd7/0x123
[    0.079000]  [<ffffffff815d5ac7>] pci_acpi_scan_root+0x43c/0x4e6
[    0.079000]  [<ffffffff815d21e9>] acpi_pci_root_add+0x19d/0x45d
[    0.079000]  [<ffffffff8134bde3>] acpi_device_probe+0x50/0x11d
[    0.079000]  [<ffffffff813d463b>] driver_probe_device+0x8b/0x390
[    0.079000]  [<ffffffff813d4940>] ? driver_probe_device+0x390/0x390
[    0.079000]  [<ffffffff813d49eb>] __driver_attach+0xab/0xb0
[    0.079000]  [<ffffffff813d4940>] ? driver_probe_device+0x390/0x390
[    0.079000]  [<ffffffff813d26c5>] bus_for_each_dev+0x55/0x90
[    0.079000]  [<ffffffff813d3fae>] driver_attach+0x1e/0x20
[    0.079000]  [<ffffffff813d3be0>] bus_add_driver+0x1a0/0x290
[    0.079000]  [<ffffffff81a0bd2a>] ? find_dock+0x22/0x22
[    0.079000]  [<ffffffff81a0bd2a>] ? find_dock+0x22/0x22
[    0.079000]  [<ffffffff813d50b7>] driver_register+0x77/0x170
[    0.079000]  [<ffffffff81a0bd2a>] ? find_dock+0x22/0x22
[    0.079000]  [<ffffffff8134c58d>] acpi_bus_register_driver+0x3e/0x48
[    0.079000]  [<ffffffff81a0bd4f>] acpi_pci_root_init+0x25/0x2d
[    0.079000]  [<ffffffff8100216a>] do_one_initcall+0x12a/0x180
[    0.079000]  [<ffffffff815c9d8c>] kernel_init+0x2cc/0x450
[    0.079000]  [<ffffffff819d8614>] ? do_early_param+0x8c/0x8c
[    0.079000]  [<ffffffff815c9ac0>] ? rest_init+0x80/0x80
[    0.079000]  [<ffffffff815fc1ac>] ret_from_fork+0x7c/0xb0
[    0.079000]  [<ffffffff815c9ac0>] ? rest_init+0x80/0x80
[    0.079000] Code: ff e9 02 fe ff ff 41 83 e6 01 b9 01 00 00 00 e9 53 ff ff ff 41 bc 10 00 00 00 e9 3e fe ff ff 49 8b 45 10 48 8b 40 10 48 8b 40 38 <48> 8b 80 88 00 00 00 48 85 c0 0f 84 84 04 00 00 49 89 44 24 10 
[    0.079000] RIP  [<ffffffff8131dbfa>] pcie_aspm_init_link_state+0x30a/0x7b0
[    0.079000]  RSP <ffff880005d25928>
[    0.079000] CR2: 0000000000000088
[    0.079005] ---[ end trace a0ff03ecb1cf5882 ]---
[    0.080008] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    0.080008]

Comment 2 Radim Krčmář 2013-06-25 14:02:17 UTC
PCIe specification does not allow direct connection of upstream port to the root hub (complex).

We have to create root port and connect throught it:
  -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
  -device x3130-upstream,bus=root.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1

Upstream kernel is not happy with a check for misconfigured qemu, so it should be avoided/prevented it in userspace.
(Qemu allows even more nonsensical topologies, where downstream port is not connected to upstream port.)

Was this command generated by libvirt?

Comment 3 zhonglinzhang 2013-07-04 10:37:26 UTC
(In reply to Radim Krčmář from comment #2)
> PCIe specification does not allow direct connection of upstream port to the
> root hub (complex).
> 
> We have to create root port and connect throught it:
>   -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
>   -device x3130-upstream,bus=root.0,id=upstream \
>   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> 
Re-tested this issue by using the command you provided, hit the same panic

---snip commandline of mine---
/usr/libexec/qemu-kvm -M q35 -device ioh3420,bus=pcie.0,id=root.0 -device x3130-upstream,bus=root.0,addr=0x4,id=upstream -device xio3130-downstream,bus=upstream,id=downstream0,chassis=1 -drive file=/home/rhel7_switch.qcow,if=none,id=drive-system-disk,media=disk,format=qcow2,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,bus=downstream0,drive=drive-system-disk,id=system-disk,bootindex=1

Hi Radim,

Would you please have a look again? Any further testing, please let me know.

> Upstream kernel is not happy with a check for misconfigured qemu, so it
> should be avoided/prevented it in userspace.
> (Qemu allows even more nonsensical topologies, where downstream port is not
> connected to upstream port.)
> 
> Was this command generated by libvirt?

Comment 4 Radim Krčmář 2013-07-04 14:07:34 UTC
The kernel boots without "addr=0x4", or with "addr=0x0".
Also the backtrace now goes through "pci_subsys_init" and not "acpi_init", so the problem is a bit different.

How are the addresses chosen?

Comment 5 zhonglinzhang 2013-07-05 02:24:47 UTC
Re-tested this issue without "addr=0x4", or with "addr=0x0". guest boot successfully and no kernel panic. 

About comment3, Is it a new issue?  Do I need open a bug to track it?

Comment 6 Radim Krčmář 2013-08-26 14:21:18 UTC
Upstream kernel decided to drop simple fix for this issue, hoping someone will rewrite aspm support instead.

Modeling hardware configurations is a qemu feature, so we won't be fixing this.
(I don't have enough information on source of these parameters to open new bugs)