Bug 2030708

Summary: qemu-kvm coredump when writing data reaches the threshold of mirroring node
Product: Red Hat Enterprise Linux 9 Reporter: Han Han <hhan>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Block Jobs QA Contact: aihua liang <aliang>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: chwen, coli, meili, virt-maint, yisun
Version: 9.0Keywords: Security
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-10 01:58:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VM xml, full backtrace, and scripts none

Description Han Han 2021-12-09 14:27:36 UTC
Created attachment 1845478 [details]
VM xml, full backtrace, and scripts

Description of problem:
As subject

Version-Release number of selected component (if applicable):
qemu-kvm-6.1.0-8.el9.x86_64
libvirt-7.10.0-1.el9.x86_64

How reproducible:
30%

Steps to Reproduce:
1. Prepare an VM named avocado-vt-vm1
2. Monitor the events of libvirt
# virsh event avocado-vt-vm1 --loop --all

3. Run the scripts to write data beyond the threshold of mirroring node:
#!/bin/bash - 
IP=192.168.122.156 # the IP of the guest
VM=avocado-vt-vm1
while true;do
    virsh start $VM
    sleep 30
    virsh snapshot-create $VM --no-metadata --disk-only
    virsh blockcommit $VM vda --active
    virsh domblkthreshold $VM 'vda[1]' 512M
    ssh hhan@$IP dd if=/dev/urandom of=file bs=1G count=1
    sleep $(shuf -i 1-10 -n1)
    virsh blockjob $VM vda --pivot
    virsh destroy $VM
    if [ $? -ne 0 ];then
        break
    fi
done

Actual results:
Sometime qemu will get segment fault:

Domain 'avocado-vt-vm1' destroyed

Domain 'avocado-vt-vm1' started

Domain snapshot 1639059023 created
Active Block Commit started

Warning: Permanently added '192.168.122.156' (ED25519) to the list of known hosts.
0+1 records in
0+1 records out
33554431 bytes (34 MB, 32 MiB) copied, 0.466541 s, 71.9 MB/s
error: Requested operation is not valid: domain is not running

error: Failed to destroy domain 'avocado-vt-vm1'
error: Requested operation is not valid: domain is not running

The event log:
event 'agent-lifecycle' for domain 'avocado-vt-vm1': state: 'connected' reason: 'channel event'                                                   
event 'block-job' for domain 'avocado-vt-vm1': Active Block Commit for /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.1639058313 ready
event 'block-job-2' for domain 'avocado-vt-vm1': Active Block Commit for vda ready    
event 'block-threshold' for domain 'avocado-vt-vm1': dev: vda[1](/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2) 536870912 140509184                                            
event 'lifecycle' for domain 'avocado-vt-vm1': Stopped Failed 

The backtrace:
(gdb) bt
#0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172
#1  0x0000557f8b044299 in mirror_iteration (s=0x557f8d628800) at ../block/mirror.c:491
#2  0x0000557f8b042b62 in mirror_run (job=0x557f8d628800, errp=<optimized out>) at ../block/mirror.c:1025
#3  0x0000557f8b00c8c6 in job_co_entry (opaque=0x557f8d628800) at ../job.c:917
#4  0x0000557f8b1d2006 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173
#5  0x00007ff9e49c1820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib64/libc.so.6
#6  0x00007ff9e39968a0 in ?? ()
#7  0x0000000000000000 in ?? ()


Expected results:
No segment fault

Additional info:
1. Since a common user could cause the qemu segment fault. It is a vulnerability of DOS. Add security keyword here.
2. The qemu cmdline of the VM:
 /usr/libexec/qemu-kvm -name guest=avocado-vt-vm1,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-20-avocado-vt-vm1/master-key.aes"} -machine pc-q35-rhel8.5.0,usb=off,dump-guest-core=off,memory-backend=pc.ram -accel kvm -cpu Skylake-Client,ds=on,acpi=on,ss=on,ht=on,tm=on,pbe=on,dtes64=on,ds-cpl=on,vmx=on,smx=on,est=on,tm2=on,xtpr=on,pdcm=on,dca=on,tsc-adjust=on,intel-pt=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off -m 2048 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648} -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 2e32e254-1837-4c71-bfbc-e0dfa18981d2 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=23,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-scsi-pci,id=scsi0,bus=pci.7,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2","file":"libvirt-2-storage","backing":null} -blockdev {"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.1639059023","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"} -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:6b:4e:21,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=27,server=on,wait=off -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev {"id":"audio1","driver":"none"} -vnc 127.0.0.1:0,audiodev=audio1 -device virtio-vga,id=video0,max_outputs=1,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

3. See the VM xml, full backtrace, and scripts in attachment

Comment 1 Meina Li 2021-12-10 01:19:52 UTC
I think this bug may be duplicated with bug 2001404 and bug 2002607.

Comment 2 aihua liang 2021-12-10 01:50:38 UTC
Yes, I also hit this issue in qemu-kvm-6.1.0-8.el9.x86_64, and as Meina comment, this bug has the same coredump info with 2001404.

Comment 3 Han Han 2021-12-10 01:58:52 UTC
(In reply to Meina Li from comment #1)
> I think this bug may be duplicated with bug 2001404 and bug 2002607.

Agree

*** This bug has been marked as a duplicate of bug 2001404 ***