Bug 1690256 - QEMU core dumped after unplug balloon device under q35 with Win2019 guest
Summary: QEMU core dumped after unplug balloon device under q35 with Win2019 guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Julia Suvorova
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks: 1743098 1744438 1897025 1948358
TreeView+ depends on / blocked
 
Reported: 2019-03-19 07:21 UTC by Yumei Huang
Modified: 2023-03-14 14:43 UTC (History)
11 users (show)

Fixed In Version: qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1743098 (view as bug list)
Environment:
Last Closed: 2021-11-16 07:49:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4684 0 None None None 2021-11-16 07:50:58 UTC

Description Yumei Huang 2019-03-19 07:21:47 UTC
Description of problem:
Boot win2019 guest, then hotplug balloon device, do balloon to evict guest memory, then unplug it, qemu core dumped.

Version-Release number of selected component (if applicable):
qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8
kernel-4.18.0-80.el8.x86_64

How reproducible:
alawys

Steps to Reproduce:
1. Boot guest with q35 machine type

2. Run following scripts to hotplug balloon device, do balloon, and unplug balloon.

# cat balloon-hotplug.sh 
for i in `seq 10`;
do
	echo "=========================== round $i ============"
	echo "info balloon" | nc -U  /tmp/monitor3
	echo "device_add virtio-balloon-pci,id=balloon0,bus=pcie.0-root-port-5,addr=0x0" | nc -U  /tmp/monitor3
	echo "info balloon" | nc -U  /tmp/monitor3
	echo "balloon 4096" | nc -U  /tmp/monitor3
	echo "info balloon" | nc -U  /tmp/monitor3
	sleep 30
	echo "info balloon" | nc -U  /tmp/monitor3
	sleep 20     <---------might need wait longer to let balloon take effect
	echo "info balloon" | nc -U  /tmp/monitor3
	echo "device_del balloon0" | nc -U /tmp/monitor3
	sleep 10
done

Actual results:
QEMU core dumped:
(qemu) ./win2019-pci.sh: line 24: 23692 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -device pvpanic,ioport=0x505,id=id5SK4co -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=idzyzw7g,vectors=4,netdev=idhia6GM,bus=pcie.0-root-port-4,addr=0x0 -netdev tap,id=idhia6GM -m 8192 -smp 16,maxcpus=16,cores=8,threads=1,sockets=2 -cpu 'IvyBridge',+kvm_pv_unhalt -vnc :1 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 -monitor stdio -serial tcp:0:4445,server,nowait -monitor unix:/tmp/monitor3,server,nowait

Expected results:
Balloon device got deleted and guest work well.

Additional info:
1. QEMU cli:
# /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -device pvpanic,ioport=0x505,id=id5SK4co  \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=0x3 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=idzyzw7g,vectors=4,netdev=idhia6GM,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idhia6GM \
    -m 8192  \
    -smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
    -cpu 'IvyBridge',+kvm_pv_unhalt \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -serial tcp:0:4445,server,nowait \
    -monitor unix:/tmp/monitor3,server,nowait

Comment 1 Yumei Huang 2019-03-19 08:57:13 UTC
Hit same issue with win2016 guest as well, but it's harder to reproduce since https://bugzilla.redhat.com/show_bug.cgi?id=1553633#c10. After do balloon, it may only work after do "system_reset", then do unplug, it may  reproduce sometimes.

Comment 2 Yumei Huang 2019-03-20 03:01:14 UTC
The balloon driver version for win2019 guest is virtio-win-prewhql-163(&169
), and balloon service is not installed.

Comment 3 Yumei Huang 2019-03-20 05:41:40 UTC
Only hit the issue with q35, works fine with pc.  

Can reproduce with qemu-kvm-core-2.12.0-34.el8+2018+8f9f13ec.

Comment 4 Yumei Huang 2019-07-01 09:32:45 UTC
Hit same issue with qemu-kvm-4.0.0-4.module+el8.1.0+3356+cda7f1ee and qemu-kvm-2.12.0-78.module+el8.1.0+3434+46ed87c2. 

Guest: Win10.x86_64, Win2016, Win2019
balloon driver: virtio-win-prewhql-0.1-172
host kernel: 4.18.0-107.el8.x86_64

Comment 5 Yumei Huang 2019-08-19 05:19:17 UTC
Reproduced with qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.

(gdb) bt
#0  0x000055a4b8b28fdd in virtio_pci_notify_write
    (opaque=0x55a4bb06a060, addr=0, val=<optimized out>, size=<optimized out>)
    at hw/virtio/virtio-pci.c:1306
#1  0x000055a4b895b053 in memory_region_write_accessor
    (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...)
    at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/memory.c:508
#2  0x000055a4b8959266 in access_with_adjusted_size
    (addr=addr@entry=0, value=value@entry=0x7f25373fe548, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry=
    0x55a4b895b000 <memory_region_write_accessor>, mr=0x55a4bb062bc0, attrs=...)
    at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/memory.c:574
#3  0x000055a4b895d200 in memory_region_dispatch_write
    (mr=0x55a4bb062bc0, addr=0, data=<optimized out>, size=2, attrs=...)
    at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/memory.c:1502
#4  0x000055a4b890a2f3 in flatview_write_continue
    (fv=0x7f25300a1ea0, addr=4244647936, attrs=..., buf=0x7f2551fda028 <error: Cannot access memory at address 0x7f2551fda028>, len=2, addr1=<optimized out>, l=<optimized out>, mr=0x55a4bb062bc0)
    at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/exec.c:3337
#5  0x000055a4b890a516 in flatview_write
    (fv=0x7f25300a1ea0, addr=4244647936, attrs=..., buf=0x7f2551fda028 <error: Cannot access memory at address 0x7f2551fda028>, len=2)
    at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/exec.c:3376
#6  0x000055a4b890e73f in address_space_write
    (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/exec.c:3466
--Type <RET> for more, q to quit, c to continue without paging--c
#7  0x000055a4b896be9a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/accel/kvm/kvm-all.c:2298
#8  0x000055a4b8950f3e in qemu_kvm_cpu_thread_fn (arg=0x55a4bb11a650) at /usr/src/debug/qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64/cpus.c:1285
#9  0x000055a4b8c70174 in qemu_thread_start (args=0x55a4bb13df00) at util/qemu-thread-posix.c:502
#10 0x00007f254cba82de in start_thread (arg=<optimized out>) at pthread_create.c:486
#11 0x00007f254c8d9133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 10 xiagao 2020-02-04 05:56:24 UTC
Also hit this issue on win1032 and win2019 under q35 machine type.
pkg:
virtue-win-prewhql-177
qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64
kernel-4.18.0-161.el8.x86_64

Comment 11 Ademar Reis 2020-02-05 22:55:47 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 12 xiagao 2020-02-12 09:17:06 UTC
Hi developer, 

Recently I'm working on balloon device delete test and found always core dump after delete it.
Test 10 times and hit 9 times.

The steps are the same with comment 0.

guest: win10-64
driver version: virtio-win-0.1-179
host info:
qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64
kernel-4.18.0-176.el8.x86_64

cmd line:
/usr/libexec/qemu-kvm -name vm1 -enable-kvm -m 8192 -smp 24,maxcpus=24,cores=12,threads=1,sockets=2 -nodefaults -cpu 'EPYC',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt -rtc base=localtime,driftfix=none -boot order=cd,menu=on -monitor stdio -M q35 -vga std -vnc :11 -qmp tcp:0:4444,server,nowait \
-device piix3-usb-uhci,id=usb -device usb-tablet,id=input0 \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x3 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x3.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x3.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x3.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x3.0x4 \
-drive file=Win10-64.qcow2,if=none,id=drive_system_disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop \
-device virtio-scsi-pci,id=scsi0,bus=pci.1 -device scsi-hd,drive=drive_system_disk,bus=scsi0.0,id=system_disk,bootindex=0 \
-netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0,vhost=on,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:52:11:36:3f:0d,bus=pci.2,mq=on,vectors=10 \
-device virtio-balloon-pci,bus=pci.5,id=balloon0

Comment 15 Yumei Huang 2020-05-14 05:47:23 UTC
Reproduced on 8.2.1.

qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9
kernel-4.18.0-193.1.2.el8_2.x86_64

Comment 16 Amnon Ilan 2020-06-14 15:08:05 UTC
*** Bug 1828654 has been marked as a duplicate of this bug. ***

Comment 18 Yumei Huang 2020-07-27 02:34:38 UTC
Reproduced on 8.3-av.

qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716
kernel-4.18.0-227.el8.x86_64

Comment 24 RHEL Program Management 2021-03-15 07:34:29 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 25 Yumei Huang 2021-03-23 08:17:49 UTC
Reopen as it can reproduce with qemu-kvm-5.2.0-13.module+el8.4.0+10397+65cef07b, but with lower chance.

Comment 27 Yumei Huang 2021-03-23 08:47:47 UTC
(In reply to Yumei Huang from comment #25)
> Reopen as it can reproduce with
> qemu-kvm-5.2.0-13.module+el8.4.0+10397+65cef07b, but with lower chance.

Add test env:
qemu-kvm-5.2.0-13.module+el8.4.0+10397+65cef07b
kernel-4.18.0-295.el8.x86_64
virtio-win-prewhql-196
guest: win2019

Comment 28 Amnon Ilan 2021-04-06 09:32:47 UTC
The fix was merged to upstream
https://github.com/qemu/qemu/commit/c3fd706165e9875a10606453ee2785dd51e987a5

Comment 33 Danilo de Paula 2021-06-08 00:28:38 UTC
Upstream feature already present in qemu-6.0.
Marked as TestOnly and moved directly to ON_QA

Comment 34 Yumei Huang 2021-06-11 08:15:40 UTC
Test packages:
    qemu-kvm-6.0.0-18.module+el8.5.0+11243+5269aaa1 
    kernel-4.18.0-310.el8.x86_64. 

Guests: 
    Win10.x86_64, Win2016, Win2019, Win2022

Test unplug balloon device under q35 repeatedly, no core dumped, but sometimes, balloon device failed to unplug as bug 1942011. 


Hi Julia, 

Do you think we can verify this bug as the core dump issue is gone, and track the unplug failure by bug 1942011? Thanks.

Comment 36 Julia Suvorova 2021-06-18 15:13:57 UTC
(In reply to Yumei Huang from comment #34)
> Test packages:
>     qemu-kvm-6.0.0-18.module+el8.5.0+11243+5269aaa1 
>     kernel-4.18.0-310.el8.x86_64. 
> 
> Guests: 
>     Win10.x86_64, Win2016, Win2019, Win2022
> 
> Test unplug balloon device under q35 repeatedly, no core dumped, but
> sometimes, balloon device failed to unplug as bug 1942011. 
> 
> 
> Hi Julia, 
> 
> Do you think we can verify this bug as the core dump issue is gone, and
> track the unplug failure by bug 1942011? Thanks.

Sure, you can verify this one, as we already have bug 1942011 to track
the issue you've hit.

Comment 37 Yumei Huang 2021-06-21 02:03:08 UTC
Thanks Julia.

Moving to verified per above comments.

Comment 43 errata-xmlrpc 2021-11-16 07:49:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684


Note You need to log in before you can comment on or make changes to this bug.