Bug 1628433 - Guest crashed when attach multiple e1000 device to the guest which has already started with ich6 sound device and rtl8139/e1000 device
Summary: Guest crashed when attach multiple e1000 device to the guest which has alread...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 8.0
Assignee: Gerd Hoffmann
QA Contact: jingzhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-13 03:10 UTC by Meina Li
Modified: 2020-01-21 08:04 UTC (History)
18 users (show)

Fixed In Version: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:12:03 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The xml used in reproduced bug (7.16 KB, text/plain)
2018-09-13 03:10 UTC, Meina Li
no flags Details
dmesg log (80.75 KB, text/plain)
2019-08-20 09:38 UTC, jingzhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:12:47 UTC

Description Meina Li 2018-09-13 03:10:18 UTC
Created attachment 1482921 [details]
The xml used in reproduced bug

Description of problem:
Guest crashed when attach multiple e1000 device to the guest which has already started with ich6 sound device and rtl8139/e1000 device

Version-Release number of selected component (if applicable):
libvirt-4.5.0-9.el7.x86_64
qemu-kvm-rhev-2.12.0-13.el7.x86_64
kernel-3.10.0-944.el7.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Start a q35 guest with ich6 sound device and rtl8139/e1000 device.
# virsh dumpxml q35
...
<interface type='network'>
      <mac address='52:54:00:e4:04:17'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='rtl8139'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x02' function='0x0'/>
    </interface>
  <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x01' function='0x0'/>
    </sound>
...

2. Hotplug multiple e1000 device to the guest.
# cat interface.xml
<interface type='network'>
      <source network='default'/>
      <model type='e1000'/>
    </interface>

# for i in {1..32};do virsh attach-device q35 interface.xml;done
...
Device attached successfully

Device attached successfully

Device attached successfully

Device attached successfully

Device attached successfully
...
error: Failed to attach device from interface.xml
error: internal error: No more available PCI slots

3. After a while, the status of guest change to shutoff.
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     q35                            shut off


Actual results:
Guest crashed after attaching.

Expected results:
Guest run successfully.

Additional info:
1) When attach >=9 e1000 device, it will reproduce the issue.
2) qemu log:
2018-09-12T07:17:56.676932Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)
qemu-kvm: hw/audio/intel-hda.c:961: intel_hda_reg_write: Assertion `reg->offset != 0' failed.
2018-09-12 07:18:34.073+0000: shutting down, reason=crashed
3) debug log:
2018-09-12 07:24:16.200+0000: 29284: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor
2018-09-12 07:24:16.200+0000: 29284: debug : qemuMonitorIO:765 : Error on monitor internal error: End of file from qemu monitor
2018-09-12 07:24:16.200+0000: 29284: info : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=16 events=12
2018-09-12 07:24:16.200+0000: 29284: debug : virEventPollInterruptLocked:722 : Skip interrupt, 1 139940568922304
2018-09-12 07:24:16.201+0000: 29284: debug : qemuMonitorIO:786 : Triggering EOF callback
2018-09-12 07:24:16.201+0000: 29284: debug : qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7f45fc3288c0 'q35'

Comment 2 Han Han 2018-09-13 03:34:56 UTC
Since qemu-kvm's assertion is failed, the component should be qemu-kvm-rhev. Please update the gdb backtrace.

Comment 3 Meina Li 2018-09-13 06:45:22 UTC
(gdb) bt
#0  0x00007f1be4032207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f1be40338f8 in __GI_abort () at abort.c:90
#2  0x00007f1be402b026 in __assert_fail_base (fmt=0x7f1be4185ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x557c69a3b586 "reg->offset != 0", 
    file=file@entry=0x557c69a3b4f2 "hw/audio/intel-hda.c", line=line@entry=961, function=function@entry=0x557c69a3be90 <__PRETTY_FUNCTION__.25847> "intel_hda_reg_write") at assert.c:92
#3  0x00007f1be402b0d2 in __GI___assert_fail (assertion=assertion@entry=0x557c69a3b586 "reg->offset != 0", file=file@entry=0x557c69a3b4f2 "hw/audio/intel-hda.c", line=line@entry=961, 
    function=function@entry=0x557c69a3be90 <__PRETTY_FUNCTION__.25847> "intel_hda_reg_write") at assert.c:101
#4  0x0000557c6977fa45 in intel_hda_reg_write (wmask=4294967295, val=157, reg=0x557c69f54f40 <regtab+11648>, d=0x557c6c0c6d00) at hw/audio/intel-hda.c:961
#5  intel_hda_mmio_write (opaque=0x557c6c0c6d00, addr=<optimized out>, val=157, size=<optimized out>) at hw/audio/intel-hda.c:1053
#6  0x0000557c69663a63 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, 
    attrs=...) at /usr/src/debug/qemu-2.12.0/memory.c:530
#7  0x0000557c69661779 in access_with_adjusted_size (addr=addr@entry=208, value=value@entry=0x7f1bda33b708, size=size@entry=4, access_size_min=<optimized out>, 
    access_size_max=<optimized out>, access_fn=access_fn@entry=0x557c69663a20 <memory_region_write_accessor>, mr=mr@entry=0x557c6c0c7890, attrs=attrs@entry=...)
    at /usr/src/debug/qemu-2.12.0/memory.c:597
#8  0x0000557c69665815 in memory_region_dispatch_write (mr=<optimized out>, addr=208, data=157, size=<optimized out>, attrs=...) at /usr/src/debug/qemu-2.12.0/memory.c:1474
#9  0x0000557c69615831 in flatview_write (fv=0x557c6c5b0700, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3089
#10 0x0000557c696191ff in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3261
#11 0x0000557c696192a5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7f1bfd6d9028 <Address 0x7f1bfd6d9028 out of bounds>, 
    len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3272
#12 0x0000557c69674128 in kvm_cpu_exec (cpu=cpu@entry=0x557c6be68000) at /usr/src/debug/qemu-2.12.0/accel/kvm/kvm-all.c:1992
#13 0x0000557c69651916 in qemu_kvm_cpu_thread_fn (arg=0x557c6be68000) at /usr/src/debug/qemu-2.12.0/cpus.c:1215
#14 0x00007f1be43d0dd5 in start_thread (arg=0x7f1bda33e700) at pthread_create.c:307
#15 0x00007f1be40f9ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 8 Yiqian Wei 2018-09-14 08:34:06 UTC
reproduce this bug version:

For rhel7.6 :
kernel-3.10.0-948.el7.x86_64 & qemu-kvm-rhev-2.12.0-15.el7

For rhel7.5.z:
kernel-3.10.0-948.el7.x86_64 & qemu-kvm-rhev-2.10.0-21.el7_5.7

test steps:
1.boot a q35 rhel7.6 guest with pcie-root-port--->pcie-pci-bridge--->device [1]
2.Hotplug multiple e1000 device to the guest.
# sh hotplug.sh 9

test results:
After step2,qemu core dumped.
(qemu) qemu-kvm: hw/audio/intel-hda.c:960: intel_hda_reg_write: Assertion `reg->offset != 0' failed.
cmd.sh: line 39: 14984 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name guest=q35,debug-threads=on -machine pc-q35-rhel7.5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off ...

Additional info:
1)for rhel7.5.z,not support pcie-to-pci-bridge,i replace it with pci-bridge.
pcie-root-port--->pci-bridge--->device:
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pci-bridge,id=pci.3,bus=pci.1,chassis_nr=1,addr=0x0 \
-device intel-hda,id=sound0,bus=pci.3,addr=0x1 \
2)cat hotplug.sh
cat hotplug.sh
for i in $(seq 1 $1); 
do
        echo    
        echo "====== The $i iterations ======"
        echo "netdev_add tap,id=hostnet-$i" | nc -U /tmp/monitor2
        sleep 5
        echo "device_add e1000,netdev=hostnet-$i,mac=52:54:00:e4:04:$(printf "%02x" $((i))),id=e1000-$i,bus=pci.3,addr=0x$(printf "%02x" $((i+2)))" | nc -U /tmp/monitor2
        sleep 5
        echo "info network" | nc -U /tmp/monitor2
        sleep 5
done
[1]boot a guest with cmd 
/usr/libexec/qemu-kvm  \
-name guest=q35,debug-threads=on \
-machine pc-q35-rhel7.6.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \
-cpu SandyBridge,enforce \
-m 4G \
-vga qxl \
-realtime mlock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid 9be93d67-6542-4aff-98b2-e65aa58e51a5 \
-nodefaults \
-rtc base=utc,driftfix=slew \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-pci-bridge,id=pci.3,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x5 \
-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
-drive file=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
-device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-netdev tap,id=hostnet0 \
-device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:e4:04:17,bus=pci.3,addr=0x2 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-spice port=5900,disable-ticketing  \
-device intel-hda,id=sound0,bus=pci.3,addr=0x1 \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
-msg timestamp=on \
-monitor stdio \
-monitor unix:/tmp/monitor2,server,nowait \

Comment 10 Gerd Hoffmann 2018-11-22 13:35:52 UTC
Guest triggerable assert in intel-hda.  Happens if the guest tries to writes a readonly register.  Not nice, but also not critical.  I guess we can just fix it upstream and pick up via rebase.  Which makes it RHEl-8 material.

https://patchwork.ozlabs.org/patch/1001778/

Comment 11 Gerd Hoffmann 2019-01-10 09:44:11 UTC
upstream commit 7ec910675929a593c9890f412125c31c578cde6e (in qemu 3.1)

Comment 12 Gerd Hoffmann 2019-03-05 08:42:26 UTC
Moving to advanced virt.

Comment 13 Gerd Hoffmann 2019-08-14 09:30:42 UTC
Rebase should have picked up this.

Comment 16 jingzhao 2019-08-20 09:38:46 UTC
Created attachment 1606029 [details]
dmesg log

Comment 17 Gerd Hoffmann 2019-08-20 12:00:48 UTC
Looking through the log, there are a bunch of issues.

[    0.213486] pci 0000:02:01.0: [8086:2668] type 00 class 0x040300
[    0.214625] pci 0000:02:01.0: reg 0x10: [mem 0xfc040000-0xfc043fff]

This is the intel-hda sound card.  Note MMIO is at 0xfc040000.

[   90.144845] shpchp 0000:01:00.0: Latch close on Slot(3)
[   90.144871] shpchp 0000:01:00.0: Button pressed on Slot(3)
[   90.144887] shpchp 0000:01:00.0: Card present on Slot(3)
[   90.145037] shpchp 0000:01:00.0: PCI slot #3 - powering on due to button press
[   96.242847] pci 0000:02:03.0: [8086:100e] type 00 class 0x020000
[   96.243048] pci 0000:02:03.0: reg 0x10: [mem 0x00000000-0x0001ffff]
[   96.243132] pci 0000:02:03.0: reg 0x14: [io  0x0000-0x003f]
[   96.243455] pci 0000:02:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[   96.243794] pci 0000:02:03.0: BAR 6: assigned [mem 0xfc080000-0xfc0bffff pref]
[   96.243798] pci 0000:02:03.0: BAR 0: assigned [mem 0xfc060000-0xfc07ffff]
[   96.243836] pci 0000:02:03.0: BAR 1: assigned [io  0xc400-0xc43f]
[   96.243867] shpchp 0000:01:00.0: PCI bridge to [bus 02]
[   96.243883] shpchp 0000:01:00.0:   bridge window [io  0xc000-0xcfff]
[   96.250364] shpchp 0000:01:00.0:   bridge window [mem 0xfc000000-0xfc1fffff]
[   96.289866] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[   96.289868] e1000: Copyright (c) 1999-2006 Intel Corporation.
[   96.289869] e1000: E1000 MODULE IS NOT SUPPORTED
[   96.289950] e1000 0000:02:03.0: enabling device (0000 -> 0003)
[   96.695345] e1000 0000:02:03.0 eth1: (PCI:33MHz:32-bit) 52:54:00:e4:04:01
[   96.695353] e1000 0000:02:03.0 eth1: Intel(R) PRO/1000 Network Connection

Hotplugged first e1000 device.  All fine.
Same goes for #2 -> #5

[  157.172030] pci 0000:02:07.0: BAR 6: no space for [mem size 0x00040000 pref]
[  157.172034] pci 0000:02:07.0: BAR 6: failed to assign [mem size 0x00040000 pref]
[  157.172038] pci 0000:02:07.0: BAR 0: assigned [mem 0xfc1e0000-0xfc1fffff]
[  157.172082] pci 0000:02:07.0: BAR 1: assigned [io  0xc800-0xc83f]
[  157.172115] shpchp 0000:01:00.0: PCI bridge to [bus 02]
[  157.172132] shpchp 0000:01:00.0:   bridge window [io  0xc000-0xcfff]
[  157.177481] shpchp 0000:01:00.0:   bridge window [mem 0xfc000000-0xfc1fffff]
[  157.192360] e1000 0000:02:07.0: enabling device (0000 -> 0003)
[  157.601318] e1000 0000:02:07.0 eth5: (PCI:33MHz:32-bit) 52:54:00:e4:04:05
[  157.601324] e1000 0000:02:07.0 eth5: Intel(R) PRO/1000 Network Connection

Hotplugged e1000 #5

No more address space in the bridge window, so bar 6 isn't initialized.
Driver initializes nevertheless.  Smells like a bug.  But possibly the
device can be operated (with limitations) even if bar 6 is not available.

Hotplugged e1000 #6

[  172.531512] pci 0000:02:08.0: BAR 6: no space for [mem size 0x00040000 pref]
[  172.531515] pci 0000:02:08.0: BAR 6: failed to assign [mem size 0x00040000 pref]
[  172.531518] pci 0000:02:08.0: BAR 0: no space for [mem size 0x00020000]
[  172.531520] pci 0000:02:08.0: BAR 0: failed to assign [mem size 0x00020000]
[  172.531525] pci 0000:02:08.0: BAR 1: assigned [io  0xc840-0xc87f]
[  172.531565] shpchp 0000:01:00.0: PCI bridge to [bus 02]
[  172.531582] shpchp 0000:01:00.0:   bridge window [io  0xc000-0xcfff]
[  172.536862] shpchp 0000:01:00.0:   bridge window [mem 0xfc000000-0xfc1fffff]

Still no more address space in the bridge window, so bar 6 and 0 are not initialized.

[  172.545125] PCI: No. 2 try to assign unassigned res
[  172.545130] release child resource [mem 0xfc000000-0xfc03ffff pref]
[  172.545132] release child resource [mem 0xfc040000-0xfc043fff]
[  172.545133] release child resource [mem 0xfc040000-0xfc043fff]
[  172.545135] release child resource [mem 0xfc044000-0xfc0440ff]
[  172.545137] release child resource [mem 0xfc044000-0xfc0440ff]
[  172.545138] release child resource [mem 0xfc060000-0xfc07ffff]
[  172.545140] release child resource [mem 0xfc060000-0xfc07ffff]
[  172.545141] release child resource [mem 0xfc080000-0xfc0bffff pref]
[  172.545143] release child resource [mem 0xfc0c0000-0xfc0fffff pref]
[  172.545144] release child resource [mem 0xfc100000-0xfc11ffff]
[  172.545146] release child resource [mem 0xfc100000-0xfc11ffff]
[  172.545147] release child resource [mem 0xfc120000-0xfc13ffff]
[  172.545149] release child resource [mem 0xfc120000-0xfc13ffff]
[  172.545150] release child resource [mem 0xfc140000-0xfc17ffff pref]
[  172.545152] release child resource [mem 0xfc180000-0xfc1bffff pref]
[  172.545153] release child resource [mem 0xfc1c0000-0xfc1dffff]
[  172.545154] release child resource [mem 0xfc1c0000-0xfc1dffff]
[  172.545156] release child resource [mem 0xfc1e0000-0xfc1fffff]
[  172.545157] release child resource [mem 0xfc1e0000-0xfc1fffff]
[  172.545161] shpchp 0000:01:00.0: resource 14 [mem 0xfc000000-0xfc1fffff] released
[  172.545164] shpchp 0000:01:00.0: PCI bridge to [bus 02]

Looks like linux wants reshuffle the pci bars, to fit them all.
But it does so while drivers are already using those pci bars.
That certainly isn't good.

[  172.549950] shpchp 0000:01:00.0: BAR 14: no space for [mem size 0x00300000]
[  172.549953] shpchp 0000:01:00.0: BAR 14: failed to assign [mem size 0x00300000]
[  172.549962] 8139cp 0000:02:02.0: BAR 6: no space for [mem size 0x00040000 pref]
[  172.549976] 8139cp 0000:02:02.0: BAR 6: failed to assign [mem size 0x00040000 pref]
[ ... more of these ... ]

Seems shpchp tried to make the memory window larger and allocate space for it.
Failed.  All child (re-allocations) fail too.  Guess the bridge window is gone
at that point.

[  182.637570] irq 21: nobody cared (try booting with the "irqpoll" option)
[ ... ]
[  182.638478] handlers:
[  182.638762] [<000000008f5990d6>] qxl_irq_handler [qxl]
[  182.639386] [<00000000aa3a80bb>] e1000_intr [e1000]
[  182.639970] [<00000000aa3a80bb>] e1000_intr [e1000]

Probably the e1000 driver fails to access the registers (no pci bridge window)
and therefore can't silence the IRQ.

[  182.790704] shpchp 0000:01:00.0: Latch close on Slot(9)
[  182.790731] shpchp 0000:01:00.0: Button pressed on Slot(9)
[  182.790749] shpchp 0000:01:00.0: Card present on Slot(9)

Hotplugged e1000 #7.

[  183.364556] e1000 0000:02:08.0: enabling device (0000 -> 0001)
[  183.368843] e1000 0000:02:08.0: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[  183.370761] e1000: probe of 0000:02:08.0 failed with error -5

e1000 #6 fails initialization.

[  183.371134] shpchp 0000:01:00.0: PCI slot #9 - powering on due to button press
[  189.674454] pci 0000:02:09.0: [8086:100e] type 00 class 0x020000
[  190.012529] pci 0000:02:09.0: reg 0x10: [mem 0x00000000-0x0001ffff]
[  190.012565] pci 0000:02:09.0: reg 0x14: [io  0x0000-0x003f]
[  190.012735] pci 0000:02:09.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[  190.012984] shpchp 0000:01:00.0: BAR 14: assigned [mem 0xfc000000-0xfc0fffff]

The pci bridge window is back.  Old location, but smaller.

[  190.012987] pci 0000:02:09.0: BAR 6: assigned [mem 0xfc000000-0xfc03ffff pref]
[  190.012989] pci 0000:02:09.0: BAR 0: assigned [mem 0xfc040000-0xfc05ffff]

e1000 #7 memory bars are placed there.
Note the intel-hda mmio registers are at 0xfc040000 too.

This is probably the point where the e1000 driver accessed the intel-hda registers
and triggered the qemu assert() that way.  This is fixed now, good.

But the kernel certainly should not have done that in the first place.
So, we still have a bug, in the linux kernel.

Comment 21 errata-xmlrpc 2019-11-06 07:12:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.