Created attachment 1482921 [details] The xml used in reproduced bug Description of problem: Guest crashed when attach multiple e1000 device to the guest which has already started with ich6 sound device and rtl8139/e1000 device Version-Release number of selected component (if applicable): libvirt-4.5.0-9.el7.x86_64 qemu-kvm-rhev-2.12.0-13.el7.x86_64 kernel-3.10.0-944.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a q35 guest with ich6 sound device and rtl8139/e1000 device. # virsh dumpxml q35 ... <interface type='network'> <mac address='52:54:00:e4:04:17'/> <source network='default' bridge='virbr0'/> <target dev='vnet0'/> <model type='rtl8139'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x02' function='0x0'/> </interface> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x03' slot='0x01' function='0x0'/> </sound> ... 2. Hotplug multiple e1000 device to the guest. # cat interface.xml <interface type='network'> <source network='default'/> <model type='e1000'/> </interface> # for i in {1..32};do virsh attach-device q35 interface.xml;done ... Device attached successfully Device attached successfully Device attached successfully Device attached successfully Device attached successfully ... error: Failed to attach device from interface.xml error: internal error: No more available PCI slots 3. After a while, the status of guest change to shutoff. # virsh list --all Id Name State ---------------------------------------------------- - q35 shut off Actual results: Guest crashed after attaching. Expected results: Guest run successfully. Additional info: 1) When attach >=9 e1000 device, it will reproduce the issue. 2) qemu log: 2018-09-12T07:17:56.676932Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0) qemu-kvm: hw/audio/intel-hda.c:961: intel_hda_reg_write: Assertion `reg->offset != 0' failed. 2018-09-12 07:18:34.073+0000: shutting down, reason=crashed 3) debug log: 2018-09-12 07:24:16.200+0000: 29284: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor 2018-09-12 07:24:16.200+0000: 29284: debug : qemuMonitorIO:765 : Error on monitor internal error: End of file from qemu monitor 2018-09-12 07:24:16.200+0000: 29284: info : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=16 events=12 2018-09-12 07:24:16.200+0000: 29284: debug : virEventPollInterruptLocked:722 : Skip interrupt, 1 139940568922304 2018-09-12 07:24:16.201+0000: 29284: debug : qemuMonitorIO:786 : Triggering EOF callback 2018-09-12 07:24:16.201+0000: 29284: debug : qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7f45fc3288c0 'q35'
Since qemu-kvm's assertion is failed, the component should be qemu-kvm-rhev. Please update the gdb backtrace.
(gdb) bt #0 0x00007f1be4032207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f1be40338f8 in __GI_abort () at abort.c:90 #2 0x00007f1be402b026 in __assert_fail_base (fmt=0x7f1be4185ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x557c69a3b586 "reg->offset != 0", file=file@entry=0x557c69a3b4f2 "hw/audio/intel-hda.c", line=line@entry=961, function=function@entry=0x557c69a3be90 <__PRETTY_FUNCTION__.25847> "intel_hda_reg_write") at assert.c:92 #3 0x00007f1be402b0d2 in __GI___assert_fail (assertion=assertion@entry=0x557c69a3b586 "reg->offset != 0", file=file@entry=0x557c69a3b4f2 "hw/audio/intel-hda.c", line=line@entry=961, function=function@entry=0x557c69a3be90 <__PRETTY_FUNCTION__.25847> "intel_hda_reg_write") at assert.c:101 #4 0x0000557c6977fa45 in intel_hda_reg_write (wmask=4294967295, val=157, reg=0x557c69f54f40 <regtab+11648>, d=0x557c6c0c6d00) at hw/audio/intel-hda.c:961 #5 intel_hda_mmio_write (opaque=0x557c6c0c6d00, addr=<optimized out>, val=157, size=<optimized out>) at hw/audio/intel-hda.c:1053 #6 0x0000557c69663a63 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-2.12.0/memory.c:530 #7 0x0000557c69661779 in access_with_adjusted_size (addr=addr@entry=208, value=value@entry=0x7f1bda33b708, size=size@entry=4, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry=0x557c69663a20 <memory_region_write_accessor>, mr=mr@entry=0x557c6c0c7890, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.12.0/memory.c:597 #8 0x0000557c69665815 in memory_region_dispatch_write (mr=<optimized out>, addr=208, data=157, size=<optimized out>, attrs=...) at /usr/src/debug/qemu-2.12.0/memory.c:1474 #9 0x0000557c69615831 in flatview_write (fv=0x557c6c5b0700, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3089 #10 0x0000557c696191ff in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3261 #11 0x0000557c696192a5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7f1bfd6d9028 <Address 0x7f1bfd6d9028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-2.12.0/exec.c:3272 #12 0x0000557c69674128 in kvm_cpu_exec (cpu=cpu@entry=0x557c6be68000) at /usr/src/debug/qemu-2.12.0/accel/kvm/kvm-all.c:1992 #13 0x0000557c69651916 in qemu_kvm_cpu_thread_fn (arg=0x557c6be68000) at /usr/src/debug/qemu-2.12.0/cpus.c:1215 #14 0x00007f1be43d0dd5 in start_thread (arg=0x7f1bda33e700) at pthread_create.c:307 #15 0x00007f1be40f9ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
reproduce this bug version: For rhel7.6 : kernel-3.10.0-948.el7.x86_64 & qemu-kvm-rhev-2.12.0-15.el7 For rhel7.5.z: kernel-3.10.0-948.el7.x86_64 & qemu-kvm-rhev-2.10.0-21.el7_5.7 test steps: 1.boot a q35 rhel7.6 guest with pcie-root-port--->pcie-pci-bridge--->device [1] 2.Hotplug multiple e1000 device to the guest. # sh hotplug.sh 9 test results: After step2,qemu core dumped. (qemu) qemu-kvm: hw/audio/intel-hda.c:960: intel_hda_reg_write: Assertion `reg->offset != 0' failed. cmd.sh: line 39: 14984 Aborted (core dumped) /usr/libexec/qemu-kvm -name guest=q35,debug-threads=on -machine pc-q35-rhel7.5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off ... Additional info: 1)for rhel7.5.z,not support pcie-to-pci-bridge,i replace it with pci-bridge. pcie-root-port--->pci-bridge--->device: -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pci-bridge,id=pci.3,bus=pci.1,chassis_nr=1,addr=0x0 \ -device intel-hda,id=sound0,bus=pci.3,addr=0x1 \ 2)cat hotplug.sh cat hotplug.sh for i in $(seq 1 $1); do echo echo "====== The $i iterations ======" echo "netdev_add tap,id=hostnet-$i" | nc -U /tmp/monitor2 sleep 5 echo "device_add e1000,netdev=hostnet-$i,mac=52:54:00:e4:04:$(printf "%02x" $((i))),id=e1000-$i,bus=pci.3,addr=0x$(printf "%02x" $((i+2)))" | nc -U /tmp/monitor2 sleep 5 echo "info network" | nc -U /tmp/monitor2 sleep 5 done [1]boot a guest with cmd /usr/libexec/qemu-kvm \ -name guest=q35,debug-threads=on \ -machine pc-q35-rhel7.6.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \ -cpu SandyBridge,enforce \ -m 4G \ -vga qxl \ -realtime mlock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 9be93d67-6542-4aff-98b2-e65aa58e51a5 \ -nodefaults \ -rtc base=utc,driftfix=slew \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-pci-bridge,id=pci.3,bus=pci.1,addr=0x0 \ -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x5 \ -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ -drive file=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -netdev tap,id=hostnet0 \ -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:e4:04:17,bus=pci.3,addr=0x2 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -spice port=5900,disable-ticketing \ -device intel-hda,id=sound0,bus=pci.3,addr=0x1 \ -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \ -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \ -msg timestamp=on \ -monitor stdio \ -monitor unix:/tmp/monitor2,server,nowait \
Guest triggerable assert in intel-hda. Happens if the guest tries to writes a readonly register. Not nice, but also not critical. I guess we can just fix it upstream and pick up via rebase. Which makes it RHEl-8 material. https://patchwork.ozlabs.org/patch/1001778/
upstream commit 7ec910675929a593c9890f412125c31c578cde6e (in qemu 3.1)
Moving to advanced virt.
Rebase should have picked up this.
Created attachment 1606029 [details] dmesg log
Looking through the log, there are a bunch of issues. [ 0.213486] pci 0000:02:01.0: [8086:2668] type 00 class 0x040300 [ 0.214625] pci 0000:02:01.0: reg 0x10: [mem 0xfc040000-0xfc043fff] This is the intel-hda sound card. Note MMIO is at 0xfc040000. [ 90.144845] shpchp 0000:01:00.0: Latch close on Slot(3) [ 90.144871] shpchp 0000:01:00.0: Button pressed on Slot(3) [ 90.144887] shpchp 0000:01:00.0: Card present on Slot(3) [ 90.145037] shpchp 0000:01:00.0: PCI slot #3 - powering on due to button press [ 96.242847] pci 0000:02:03.0: [8086:100e] type 00 class 0x020000 [ 96.243048] pci 0000:02:03.0: reg 0x10: [mem 0x00000000-0x0001ffff] [ 96.243132] pci 0000:02:03.0: reg 0x14: [io 0x0000-0x003f] [ 96.243455] pci 0000:02:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] [ 96.243794] pci 0000:02:03.0: BAR 6: assigned [mem 0xfc080000-0xfc0bffff pref] [ 96.243798] pci 0000:02:03.0: BAR 0: assigned [mem 0xfc060000-0xfc07ffff] [ 96.243836] pci 0000:02:03.0: BAR 1: assigned [io 0xc400-0xc43f] [ 96.243867] shpchp 0000:01:00.0: PCI bridge to [bus 02] [ 96.243883] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] [ 96.250364] shpchp 0000:01:00.0: bridge window [mem 0xfc000000-0xfc1fffff] [ 96.289866] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 96.289868] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 96.289869] e1000: E1000 MODULE IS NOT SUPPORTED [ 96.289950] e1000 0000:02:03.0: enabling device (0000 -> 0003) [ 96.695345] e1000 0000:02:03.0 eth1: (PCI:33MHz:32-bit) 52:54:00:e4:04:01 [ 96.695353] e1000 0000:02:03.0 eth1: Intel(R) PRO/1000 Network Connection Hotplugged first e1000 device. All fine. Same goes for #2 -> #5 [ 157.172030] pci 0000:02:07.0: BAR 6: no space for [mem size 0x00040000 pref] [ 157.172034] pci 0000:02:07.0: BAR 6: failed to assign [mem size 0x00040000 pref] [ 157.172038] pci 0000:02:07.0: BAR 0: assigned [mem 0xfc1e0000-0xfc1fffff] [ 157.172082] pci 0000:02:07.0: BAR 1: assigned [io 0xc800-0xc83f] [ 157.172115] shpchp 0000:01:00.0: PCI bridge to [bus 02] [ 157.172132] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] [ 157.177481] shpchp 0000:01:00.0: bridge window [mem 0xfc000000-0xfc1fffff] [ 157.192360] e1000 0000:02:07.0: enabling device (0000 -> 0003) [ 157.601318] e1000 0000:02:07.0 eth5: (PCI:33MHz:32-bit) 52:54:00:e4:04:05 [ 157.601324] e1000 0000:02:07.0 eth5: Intel(R) PRO/1000 Network Connection Hotplugged e1000 #5 No more address space in the bridge window, so bar 6 isn't initialized. Driver initializes nevertheless. Smells like a bug. But possibly the device can be operated (with limitations) even if bar 6 is not available. Hotplugged e1000 #6 [ 172.531512] pci 0000:02:08.0: BAR 6: no space for [mem size 0x00040000 pref] [ 172.531515] pci 0000:02:08.0: BAR 6: failed to assign [mem size 0x00040000 pref] [ 172.531518] pci 0000:02:08.0: BAR 0: no space for [mem size 0x00020000] [ 172.531520] pci 0000:02:08.0: BAR 0: failed to assign [mem size 0x00020000] [ 172.531525] pci 0000:02:08.0: BAR 1: assigned [io 0xc840-0xc87f] [ 172.531565] shpchp 0000:01:00.0: PCI bridge to [bus 02] [ 172.531582] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] [ 172.536862] shpchp 0000:01:00.0: bridge window [mem 0xfc000000-0xfc1fffff] Still no more address space in the bridge window, so bar 6 and 0 are not initialized. [ 172.545125] PCI: No. 2 try to assign unassigned res [ 172.545130] release child resource [mem 0xfc000000-0xfc03ffff pref] [ 172.545132] release child resource [mem 0xfc040000-0xfc043fff] [ 172.545133] release child resource [mem 0xfc040000-0xfc043fff] [ 172.545135] release child resource [mem 0xfc044000-0xfc0440ff] [ 172.545137] release child resource [mem 0xfc044000-0xfc0440ff] [ 172.545138] release child resource [mem 0xfc060000-0xfc07ffff] [ 172.545140] release child resource [mem 0xfc060000-0xfc07ffff] [ 172.545141] release child resource [mem 0xfc080000-0xfc0bffff pref] [ 172.545143] release child resource [mem 0xfc0c0000-0xfc0fffff pref] [ 172.545144] release child resource [mem 0xfc100000-0xfc11ffff] [ 172.545146] release child resource [mem 0xfc100000-0xfc11ffff] [ 172.545147] release child resource [mem 0xfc120000-0xfc13ffff] [ 172.545149] release child resource [mem 0xfc120000-0xfc13ffff] [ 172.545150] release child resource [mem 0xfc140000-0xfc17ffff pref] [ 172.545152] release child resource [mem 0xfc180000-0xfc1bffff pref] [ 172.545153] release child resource [mem 0xfc1c0000-0xfc1dffff] [ 172.545154] release child resource [mem 0xfc1c0000-0xfc1dffff] [ 172.545156] release child resource [mem 0xfc1e0000-0xfc1fffff] [ 172.545157] release child resource [mem 0xfc1e0000-0xfc1fffff] [ 172.545161] shpchp 0000:01:00.0: resource 14 [mem 0xfc000000-0xfc1fffff] released [ 172.545164] shpchp 0000:01:00.0: PCI bridge to [bus 02] Looks like linux wants reshuffle the pci bars, to fit them all. But it does so while drivers are already using those pci bars. That certainly isn't good. [ 172.549950] shpchp 0000:01:00.0: BAR 14: no space for [mem size 0x00300000] [ 172.549953] shpchp 0000:01:00.0: BAR 14: failed to assign [mem size 0x00300000] [ 172.549962] 8139cp 0000:02:02.0: BAR 6: no space for [mem size 0x00040000 pref] [ 172.549976] 8139cp 0000:02:02.0: BAR 6: failed to assign [mem size 0x00040000 pref] [ ... more of these ... ] Seems shpchp tried to make the memory window larger and allocate space for it. Failed. All child (re-allocations) fail too. Guess the bridge window is gone at that point. [ 182.637570] irq 21: nobody cared (try booting with the "irqpoll" option) [ ... ] [ 182.638478] handlers: [ 182.638762] [<000000008f5990d6>] qxl_irq_handler [qxl] [ 182.639386] [<00000000aa3a80bb>] e1000_intr [e1000] [ 182.639970] [<00000000aa3a80bb>] e1000_intr [e1000] Probably the e1000 driver fails to access the registers (no pci bridge window) and therefore can't silence the IRQ. [ 182.790704] shpchp 0000:01:00.0: Latch close on Slot(9) [ 182.790731] shpchp 0000:01:00.0: Button pressed on Slot(9) [ 182.790749] shpchp 0000:01:00.0: Card present on Slot(9) Hotplugged e1000 #7. [ 183.364556] e1000 0000:02:08.0: enabling device (0000 -> 0001) [ 183.368843] e1000 0000:02:08.0: can't ioremap BAR 0: [??? 0x00000000 flags 0x0] [ 183.370761] e1000: probe of 0000:02:08.0 failed with error -5 e1000 #6 fails initialization. [ 183.371134] shpchp 0000:01:00.0: PCI slot #9 - powering on due to button press [ 189.674454] pci 0000:02:09.0: [8086:100e] type 00 class 0x020000 [ 190.012529] pci 0000:02:09.0: reg 0x10: [mem 0x00000000-0x0001ffff] [ 190.012565] pci 0000:02:09.0: reg 0x14: [io 0x0000-0x003f] [ 190.012735] pci 0000:02:09.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] [ 190.012984] shpchp 0000:01:00.0: BAR 14: assigned [mem 0xfc000000-0xfc0fffff] The pci bridge window is back. Old location, but smaller. [ 190.012987] pci 0000:02:09.0: BAR 6: assigned [mem 0xfc000000-0xfc03ffff pref] [ 190.012989] pci 0000:02:09.0: BAR 0: assigned [mem 0xfc040000-0xfc05ffff] e1000 #7 memory bars are placed there. Note the intel-hda mmio registers are at 0xfc040000 too. This is probably the point where the e1000 driver accessed the intel-hda registers and triggered the qemu assert() that way. This is fixed now, good. But the kernel certainly should not have done that in the first place. So, we still have a bug, in the linux kernel.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723