Bug 1964846 - QEMU core dumped after unplug vcpu under ovmf
Summary: QEMU core dumped after unplug vcpu under ovmf
Keywords:
Status: CLOSED DUPLICATE of bug 1849172
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.5
Assignee: Virtualization Maintenance
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-26 08:14 UTC by Yumei Huang
Modified: 2021-06-02 01:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-26 09:33:32 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yumei Huang 2021-05-26 08:14:22 UTC
Description of problem:
Boot guest with ovmf and vcpu devices, after guest boot up, unplug the vcpu devices, sometimes qemu core dumped.

Version-Release number of selected component (if applicable):
qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d
edk2-ovmf-20210519git15ee7b76891a-1.el8.1938238pre2.noarch
Host kernel: 4.18.0-307.el8.x86_64
Guest kernel:  4.18.0-308.el8.x86_64 

How reproducible:
1/10

Steps to Reproduce:
1. Boot guest with ovmf and vcpu devices

2. Unplug the vcpu devices
{'execute': 'device_del', 'arguments': {'id': 'vcpu4'}, 'id': '4roB9mNx'}


Actual results:
Sometime qemu core dumped with Segmentation fault.

Expected results:
Guest works well.

Additional info:
1. Gdb bt:
(gdb) bt
#0  0x000055ff7c9943c4 in kvm_cpu_kick (cpu=<optimized out>) at ../accel/kvm/kvm-all.c:2967
#1  kvm_ipi_signal (sig=10) at ../accel/kvm/kvm-all.c:2967
#2  <signal handler called>
#3  0x00007f7019fc68ab in madvise () at ../sysdeps/unix/syscall-template.S:78
#4  0x00007f701a29c404 in advise_stack_range (guardsize=<optimized out>, pd=140109666055936, 
    size=<optimized out>, mem=0x7f6dd25ff000) at allocatestack.c:392
#5  start_thread (arg=<optimized out>) at pthread_create.c:569
#6  0x00007f7019fcbdc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


2. QEMU cli:
 /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
    -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel850-64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \
    -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 8192  \
    -smp 12,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device Haswell-noTSX-x86_64-cpu,id=vcpu1,socket-id=1,die-id=0,core-id=4,thread-id=0 \
    -device Haswell-noTSX-x86_64-cpu,id=vcpu2,socket-id=1,die-id=0,core-id=5,thread-id=0 \
    -device Haswell-noTSX-x86_64-cpu,id=vcpu3,socket-id=1,die-id=0,core-id=6,thread-id=0 \
    -device Haswell-noTSX-x86_64-cpu,id=vcpu4,socket-id=1,die-id=0,core-id=7,thread-id=0 \
    -chardev socket,path=/tmp/avocado_3vtlgkr3/monitor-qmpmonitor1-20210526-012614-vnbY8ake,wait=off,server=on,id=qmp_id_qmpmonitor1  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,path=/tmp/avocado_3vtlgkr3/monitor-catch_monitor-20210526-012614-vnbY8ake,wait=off,server=on,id=qmp_id_catch_monitor  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idqy9n33 \
    -chardev socket,path=/tmp/avocado_3vtlgkr3/serial-serial0-20210526-012614-vnbY8ake,wait=off,server=on,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20210526-012614-vnbY8ake,path=/tmp/avocado_3vtlgkr3/seabios-20210526-012614-vnbY8ake,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20210526-012614-vnbY8ake,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:6e:7d:a6:50:4f,id=idmhAlC2,netdev=idcIBRQY,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idcIBRQY,vhost=on,vhostfd=20,fd=16  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

Comment 1 Laszlo Ersek 2021-05-26 09:30:08 UTC
I think the stack trace in comment 0 is incomplete; it does not cover all QEMU threads. The backtrace does not seem to be from the thread that crashed.

Comment 2 Laszlo Ersek 2021-05-26 09:31:20 UTC
Additionally, I don't think this is a regression. VCPU hot-unplug is a new feature.

Comment 3 Laszlo Ersek 2021-05-26 09:33:32 UTC
... what I meant was, "VCPU hot-unplug with SMI" is a new feature. From bug 1849172.

In fact, I think this BZ should be closed as a duplicate of bug 1849172, and the above result should be captured and discussed as part of the ON_QA bug state, on bug 1849172.

*** This bug has been marked as a duplicate of bug 1849172 ***


Note You need to log in before you can comment on or make changes to this bug.