Bug 2021454

Summary: Core Dumped When Executing system_reset Repeatedly While The Guest Over NVMe block Is Booting
Product: Red Hat Enterprise Linux 8 Reporter: Tingting Mao <timao>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
qemu-kvm sub component: NVMe QA Contact: Tingting Mao <timao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: low CC: coli, hreitz, jinzhao, juzhang, kkiwi, virt-maint, xuwei
Version: 8.6Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2007932 Environment:
Last Closed: 2022-08-11 08:35:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2007932    
Bug Blocks:    

Description Tingting Mao 2021-11-09 10:07:00 UTC
+++ This bug was initially created as a clone of Bug #2007932 +++

Still hit the issue in rhel8.6.


Tested with:
qemu-kvm-6.1.50-4.scrmod+el8.6.0+13148+60ec5265.wrb211103
kernel-4.18.0-348.4.el8.kpq0.x86_64


Steps:
Boot guest from the NVMe image installed well, and execute system_reset in HMP repeatly.
# sh qemu.sh 
QEMU 6.1.50 monitor - type 'help' for more information
(qemu) cont
(qemu) sys
system_powerdown  system_reset      system_wakeup     
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) free(): invalid size
qemu.sh: line 33:  5258 Aborted                 (core dumped) /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine q35 -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 -nodefaults -device VGA,bus=pcie.0,addr=0x2 -m 15360 -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2 -cpu 'Haswell-noTSX',+kvm_pv_unhalt -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -object iothread,id=iothread0 -object iothread,id=iothread1 -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 -device virtio-net-pci,mac=9a:1c:0c:0d:e3:4c,id=idjmZXQS,netdev=idEFQ4i1,bus=pcie-root-port-3,addr=0x0 -netdev tap,id=idEFQ4i1,vhost=on -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -monitor stdio -chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off -mon chardev=qmp_id_qmpmonitor1,mode=control -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=5 -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pcie-root-port-5,addr=0x0,iothread=iothread1 -blockdev node-name=nvme_image1,driver=nvme,device=0000:bc:00.0,namespace=1,auto-read-only=on,discard=unmap -blockdev node-name=drive_nvme1,driver=qcow2,file=nvme_image1,read-only=off,discard=unmap -device scsi-hd,id=nvme1,drive=drive_nvme1

Note:
The CML to boot:
# cat qemu.sh 
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:1c:0c:0d:e3:4c,id=idjmZXQS,netdev=idEFQ4i1,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idEFQ4i1,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=5 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pcie-root-port-5,addr=0x0,iothread=iothread1 \
    -blockdev node-name=nvme_image1,driver=nvme,device=0000:bc:00.0,namespace=1,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_nvme1,driver=qcow2,file=nvme_image1,read-only=off,discard=unmap \
    -device scsi-hd,id=nvme1,drive=drive_nvme1 \

Comment 2 John Ferlan 2021-11-17 14:27:08 UTC
Philippe - assigning directly to you since you own the RHEL9 clone bug 2007932

Comment 4 Klaus Heinrich Kiwi 2022-01-03 14:05:24 UTC
It's always a dilemma whether to have duplicate bugs for each version before reaching a solution, or after. I'll keep this where it is, but with a lower priority and a dependency to Bug 2007932, but for next time, let's consider having the bug tracking only nextrelease (or the latest released version if it's not reproducible in the nextrelease), and create duplicates only when we make a determination that the root cause has been found/fixed, and we need to bring those to N-1/N-2 releases ...

Comment 7 Klaus Heinrich Kiwi 2022-02-07 14:29:55 UTC
Assigning to Stefan for consistency with Bug 2007932, where the investigation is actually happening

Comment 8 Klaus Heinrich Kiwi 2022-02-21 13:51:27 UTC
(In reply to Klaus Heinrich Kiwi from comment #7)
> Assigning to Stefan for consistency with Bug 2007932, where the
> investigation is actually happening

Actually doing this now

Comment 10 Tingting Mao 2022-08-11 08:35:16 UTC
Not hit the issue in latest rhel8. So close it now.

Tested with:
qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e
kernel-4.18.0-414.el8.kpq1.g315e.x86_64