Description of problem: qemu crashed when do external mem and disk snapshot Version-Release number of selected component (if applicable): qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 libvirt-5.10.0-1.module+el8.2.0+5135+ed3b2489.x86_64 How reproducible: 100% Steps to Reproduce: 1. having test vm avocado-vt-vm1 (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list Id Name State -------------------------------- 1 gls running 10 avocado-vt-vm1 running 2. prepare a snapshot xml as follow: (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# cat snap.xml <domainsnapshot> <name>local-external</name> <memory snapshot='external' file='/tmp/test_api_snapshot.mem'/> <disk name='vda' snapshot='external'> <source file='/var/lib/libvirt/images/test_api_snapshot.disk'/> </disk> </domainsnapshot> 3. do snapshot (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh snapshot-create avocado-vt-vm1 --xmlfile snap.xml Domain snapshot local-external created from 'snap.xml' 4. check running vm (avocado-vt-vm1 will be still running for a while) (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list Id Name State -------------------------------- 1 gls running 10 avocado-vt-vm1 running ... Crash happens after few seconds (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list Id Name State ---------------------- 1 gls running 5. libvirt qemu log as follow: (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log ... 2019-12-10 10:05:59.294+0000: 998: debug : virFileClose:113 : Closed fd 46 2019-12-10 10:05:59.294+0000: 998: debug : virFileClose:113 : Closed fd 47 2019-12-10 10:05:59.294+0000: 998: debug : virCommandHandshakeChild:412 : Handshake with parent is done char device redirected to /dev/pts/1 (label charserial0) qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. 2019-12-10 10:06:33.876+0000: shutting down, reason=crashed 6. gdb backtrace info of the qemu process as follow: # gdb -p <pid_of_qemu_process> ... Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e43fff700 (LWP 3591)] 0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e50fd2700 (LWP 3590)] 0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e43fff700 (LWP 3591)] 0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e50fd2700 (LWP 3590)] 0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0 (gdb) c Continuing. [New Thread 0x7f5e52116700 (LWP 3623)] Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1. 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e43fff700 (LWP 3591)] 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. [Thread 0x7f5e52116700 (LWP 3623) exited] [New Thread 0x7f5e52116700 (LWP 3626)] Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e50fd2700 (LWP 3590)] 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e43fff700 (LWP 3591)] 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. [Thread 0x7f5e52116700 (LWP 3626) exited] Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e50fd2700 (LWP 3590)] 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f5e43fff700 (LWP 3591)] 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 (gdb) c Continuing. [New Thread 0x7f5e52116700 (LWP 3653)] [New Thread 0x7f5decd92700 (LWP 3654)] [New Thread 0x7f5ddffff700 (LWP 3655)] [New Thread 0x7f5ddf7fe700 (LWP 3656)] [New Thread 0x7f5ddeffd700 (LWP 3657)] [New Thread 0x7f5dde7fc700 (LWP 3658)] [New Thread 0x7f5dddffb700 (LWP 3659)] [New Thread 0x7f5ddd7fa700 (LWP 3660)] Thread 1 "qemu-kvm" received signal SIGABRT, Aborted. [Switching to Thread 0x7f5e5f5a1f00 (LWP 3569)] 0x00007f5e5a18d70f in raise () from target:/lib64/libc.so.6 (gdb) t a a bt Thread 18 (Thread 0x7f5ddd7fa700 (LWP 3660)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 17 (Thread 0x7f5dddffb700 (LWP 3659)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 16 (Thread 0x7f5dde7fc700 (LWP 3658)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 15 (Thread 0x7f5ddeffd700 (LWP 3657)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 14 (Thread 0x7f5ddf7fe700 (LWP 3656)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 13 (Thread 0x7f5ddffff700 (LWP 3655)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 --Type <RET> for more, q to quit, c to continue without paging-- Thread 12 (Thread 0x7f5decd92700 (LWP 3654)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 11 (Thread 0x7f5e52116700 (LWP 3653)): #0 0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0 #1 0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0 #2 0x0000556008c9d4ff in qemu_sem_timedwait () #3 0x0000556008c98424 in worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 8 (Thread 0x7f5e421ff700 (LWP 3594)): #0 0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x0000556008c9d28d in qemu_cond_wait_impl () #2 0x0000556008bc1f21 in vnc_worker_thread_loop () #3 0x0000556008bc2850 in vnc_worker_thread () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 7 (Thread 0x7f5e42a26700 (LWP 3593)): #0 0x00007f5e5a246f21 in poll () from target:/lib64/libc.so.6 #1 0x00007f5e5ea109b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0 #2 0x00007f5e5ea10d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x00007f5e5c3f847b in red_worker_main () from target:/lib64/libspice-server.so.1 #4 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #5 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 6 (Thread 0x7f5e43fff700 (LWP 3591)): #0 0x00007f5e5a24887b in ioctl () from target:/lib64/libc.so.6 #1 0x000055600898ec39 in kvm_vcpu_ioctl () #2 0x000055600898ecf9 in kvm_cpu_exec () #3 0x0000556008973d5e in qemu_kvm_cpu_thread_fn () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 5 (Thread 0x7f5e50fd2700 (LWP 3590)): #0 0x00007f5e5a24887b in ioctl () from target:/lib64/libc.so.6 #1 0x000055600898ec39 in kvm_vcpu_ioctl () #2 0x000055600898ecf9 in kvm_cpu_exec () #3 0x0000556008973d5e in qemu_kvm_cpu_thread_fn () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 --Type <RET> for more, q to quit, c to continue without paging-- Thread 4 (Thread 0x7f5e517d3700 (LWP 3588)): #0 0x00007f5e5a246f21 in poll () from target:/lib64/libc.so.6 #1 0x00007f5e5ea109b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0 #2 0x00007f5e5ea10d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x0000556008a76db1 in iothread_run () #4 0x0000556008c9ced4 in qemu_thread_start () #5 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 2 (Thread 0x7f5e5313d700 (LWP 3584)): #0 0x00007f5e5a24c6ed in syscall () from target:/lib64/libc.so.6 #1 0x0000556008c9d73f in qemu_event_wait () #2 0x0000556008caf962 in call_rcu_thread () #3 0x0000556008c9ced4 in qemu_thread_start () #4 0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0 #5 0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6 Thread 1 (Thread 0x7f5e5f5a1f00 (LWP 3569)): #0 0x00007f5e5a18d70f in raise () from target:/lib64/libc.so.6 #1 0x00007f5e5a177b25 in abort () from target:/lib64/libc.so.6 #2 0x00007f5e5a1779f9 in __assert_fail_base.cold.0 () from target:/lib64/libc.so.6 #3 0x00007f5e5a185cc6 in __assert_fail () from target:/lib64/libc.so.6 #4 0x0000556008c15605 in bdrv_co_write_req_prepare.isra.14.part () #5 0x0000556008c19efe in bdrv_aligned_pwritev () #6 0x0000556008c1a45b in bdrv_co_pwritev_part () #7 0x0000556008c07016 in blk_co_pwritev_part () #8 0x0000556008c070c7 in blk_aio_write_entry () #9 0x0000556008cb0ed3 in coroutine_trampoline () #10 0x00007f5e5a1a32e0 in ?? () from target:/lib64/libc.so.6 #11 0x00007ffc6356de00 in ?? () #12 0x0000000000000000 in ?? ()
qemu process as follow: (.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# ps -ef | grep avocado-vt-vm1 qemu 2797 1 10 05:59 ? 00:00:00 /usr/libexec/qemu-kvm -name guest=avocado-vt-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-avocado-vt-vm1/master-key.aes -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off -cpu Opteron_G5,vme=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc-adjust=on,bmi1=on,arch-capabilities=on,ssbd=on,mmxext=on,fxsr-opt=on,cmp-legacy=on,cr8legacy=on,osvw=on,perfctr-core=on,ibpb=on,amd-ssbd=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,svm=off,vmx=off -m 1024 -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 5f4b696f-3fb4-48b9-8b58-6e1020064005 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -netdev tap,fd=42,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:02:46:85,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=44,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:1 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Hi, test with snapshot on disk, don't hit this issue. Test Env: qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 kernel version:4.18.0-160.el8.x86_64 Test steps: 1.Start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -m 7168 \ -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191210-025743-Q0JzJpKT,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191210-025743-Q0JzJpKT,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idDU2Q2E \ -chardev socket,server,path=/var/tmp/serial-serial0-20191210-025743-Q0JzJpKT,id=chardev_serial0,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20191210-025743-Q0JzJpKT,path=/var/tmp/seabios-20191210-025743-Q0JzJpKT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20191210-025743-Q0JzJpKT,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0,multifunction=on \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x2.0x1,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread0,bus=pcie.0-root-port-3,addr=0x0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x2.0x2,bus=pcie.0 \ -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \ -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-4,iothread=iothread1 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x2.0x3,bus=pcie.0 \ -device virtio-net-pci,mac=9a:9b:1d:13:61:86,id=idKg9AzR,netdev=idxDM2m8,bus=pcie.0-root-port-5,addr=0x0 \ -netdev tap,id=idxDM2m8,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x2.0x4,bus=pcie.0 \ -qmp tcp:0:3000,server,nowait \ 2.Create snapshot target {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}} {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}} {"return": {}} {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}} {"return": {}} {'execute':'job-dismiss','arguments':{'id':'job1'}} {"return": {}} {'execute':'job-dismiss','arguments':{'id':'job2'}} {"return": {}} 3.Do snapshot on drive_image1 {"execute":"blockdev-snapshot","arguments":{"node":"drive_image1","overlay":"sn1"}} {"return": {}} After step3, qemu not crash.
Created attachment 1644258 [details] Domain XMLs, ARGVs, libvirtd log, detailed backtrace I think it is caused by blockdev features. Evidence: For the same steps, reproduced when blockdev enabled, not reproduced whe blockdev disabled. Version: libvirt-5.10.0-1.module+el8.2.0+5040+bd433686.x86_64 qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64 Steps: 1. Start an VM 2. Create a memory&disk snapshot # virsh snapshot-create-as pc s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1 Then vm gets SIGABRT. # abrt-cli ls id 266e34a039bca1273d0c2a4e1af8db1a28fba6ff reason: _nl_load_domain.cold.0(): qemu-kvm killed by SIGABRT time: Wed 11 Dec 2019 10:38:37 PM EST cmdline: /usr/libexec/qemu-kvm -name guest=pc,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-pc/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Opteron_G3 -m 1024 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c0fae7ee-7f4e-4cb0-aca4-25fa945a3f83 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=37,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -blockdev '{\"driver\":\"file\",\"filename\":\"/var/lib/libvirt/images/RHEL-8.2-x86_64-latest.qcow2\",\"node-name\":\"libvirt-2-storage\",\"auto-read-only\":true,\"discard\":\"unmap\"}' -blockdev '{\"node-name\":\"libvirt-2-format\",\"read-only\":false,\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\",\"backing\":null}' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=libvirt-2-format,id=virtio-disk0,bootindex=1,serial=SB -blockdev '{\"driver\":\"file\",\"filename\":\"/tmp/copy.img1\",\"aio\":\"native\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' -blockdev '{\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"discard\":\"ignore\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":\"libvirt-1-storage\"}' -device virtio-blk-pci,ioeventfd=on,scsi=off,bus=pci.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk1,write-cache=off,werror=report,rerror=report -netdev tap,fd=39,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:09:f8:0c,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=13,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on package: 15:qemu-kvm-core-4.2.0-1.module+el8.2.0+4793+b09dd2fb uid: 107 (qemu) count: 1 Directory: /var/spool/abrt/ccpp-2019-12-11-22:38:37-6076 Run 'abrt-cli report /var/spool/abrt/ccpp-2019-12-11-22:38:37-6076' for creating a case in Red Hat Customer Portal Additional info: Not reproduced when memory only snapshot or disk-only snapshot Not reproduced when blockdev diskabled: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> ... <qemu:capabilities> <qemu:add capability='drive'/> <qemu:del capability='blockdev'/> </qemu:capabilities> </domain> Not reproduced on RHEL8.1.1-AV(libvirt-5.6.0-10.module+el8.1.1+5131+a6fe889c.x86_64 qemu-kvm-4.1.0-18.module+el8.1.1+5150+45ce6c40.x86_64)
It can be reproduced on q35 machine type: qemu log: /usr/libexec/qemu-kvm \ -name guest=q35,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-q35/master-key.aes \ -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \ -cpu Opteron_G3 \ -m 1024 \ -overcommit mem-lock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 516b83cb-bd68-47cf-b577-9ed07741601e \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=37,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x5.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x5 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x5.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x5.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0,addr=0x6 \ -device virtio-serial-pci,id=virtio-serial0,bus=pcie.0,addr=0x7 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/RHEL-8.2-x86_64-latest-clone.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pcie.0,addr=0x9,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 \ -netdev tap,fd=39,id=hostnet0 \ -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:87:15:bd,bus=pcie.0,addr=0x3 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev spicevmc,id=charchannel0,name=vdagent \ -device virtserialport,bus=virtio-serial0.0,nr=13,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x2 \ -device intel-hda,id=sound0,bus=pcie.0,addr=0x4 \ -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \ -chardev spicevmc,id=charredir0,name=usbredir \ -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 \ -chardev spicevmc,id=charredir1,name=usbredir \ -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 \ -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/6 (label charserial0) qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. 2019-12-12 06:06:39.735+0000: shutting down, reason=crashed
(In reply to aihua liang from comment #4) > Hi, test with snapshot on disk, don't hit this issue. > > Test Env: > qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 > kernel version:4.18.0-160.el8.x86_64 > Hi Aihua, I found this crash happens after the snapshot but within a uncertain period. Today when I try to reproduce it, it took around 1 minutes to crash, as follow: [root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2 --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2 Domain snapshot s2 created [root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh domstate avocado-vt-vm1; done Time:1 running Time:2 running Time:3 running ... Time:54 running Time:55 shut off ... [root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. 2019-12-13 06:14:06.943+0000: shutting down, reason=crashed Could you pls double check if the qemu process terminated with a longer period?
(In reply to yisun from comment #7) > (In reply to aihua liang from comment #4) > > Hi, test with snapshot on disk, don't hit this issue. > > > > Test Env: > > qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 > > kernel version:4.18.0-160.el8.x86_64 > > > > > Hi Aihua, > I found this crash happens after the snapshot but within a uncertain period. > Today when I try to reproduce it, it took around 1 minutes to crash, as > follow: > > [root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2 > --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2 > Domain snapshot s2 created > [root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh > domstate avocado-vt-vm1; done > Time:1 > running > > Time:2 > running > > Time:3 > running > > ... > > Time:54 > running > > Time:55 > shut off > ... > [root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log > qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm > & BLK_PERM_WRITE' failed. > 2019-12-13 06:14:06.943+0000: shutting down, reason=crashed > > > Could you pls double check if the qemu process terminated with a longer > period? Hi, YiSun Test again, and wait for 2 minutes later, still can't reproduce: After migration: {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri": "exec:gzip -c > STATEFILE.gz"},"id":"libvirt-16"} finished, vm enter "paused (postmigrate)" status successfully. BR, Aliang
(In reply to aihua liang from comment #8) > (In reply to yisun from comment #7) > > (In reply to aihua liang from comment #4) > > > Hi, test with snapshot on disk, don't hit this issue. > > > > > > Test Env: > > > qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64 > > > kernel version:4.18.0-160.el8.x86_64 > > > > > > > > > Hi Aihua, > > I found this crash happens after the snapshot but within a uncertain period. > > Today when I try to reproduce it, it took around 1 minutes to crash, as > > follow: > > > > [root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2 > > --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2 > > Domain snapshot s2 created > > [root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh > > domstate avocado-vt-vm1; done > > Time:1 > > running > > > > Time:2 > > running > > > > Time:3 > > running > > > > ... > > > > Time:54 > > running > > > > Time:55 > > shut off > > ... > > [root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log > > qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm > > & BLK_PERM_WRITE' failed. > > 2019-12-13 06:14:06.943+0000: shutting down, reason=crashed > > > > > > Could you pls double check if the qemu process terminated with a longer > > period? > > > Hi, YiSun > > Test again, and wait for 2 minutes later, still can't reproduce: > After migration: > {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false, > "uri": > "exec:gzip -c > STATEFILE.gz"},"id":"libvirt-16"} finished, vm enter "paused > (postmigrate)" status successfully. > > > BR, > Aliang thx for the info Hi Admear, Since from pure qemu cmd line, the problem is not reproduced yet. Could you pls have a look about the backtrace or libvirtd log to see if any clue can be found? thanks
Hm, an idea, yes. After migration completes, all block devices are inactive, i.e. ownership is released. However, if you do blockdev-add in this state, the new node will be active, which breaks the assumption that all nodes are in the same active/inactive state. Taking a snapshot means that we now have an active overlay over an inactive backing file. When doing 'cont', we recursively activate all images trees from their root. However, since the root is already active, we skip the activation of the whole subtree, so the backing file stays inactive and the next write to it causes an assertion failure. Using blockdev-snapshot-sync rather than blockdev-snapshot, the flags are copied from the old top layer, so bdrv_open() for the new overlay is called with BDRV_O_INACTIVE. I don't think bdrv_open() was ever supposed to be called with that flag, but it could be an explanation why the case accidentally worked with libvirt in -drive mode.
For the record: This is BZ prevents the whole virt module of passing gating without requiring manual intervention. Please consider raising the priority of this.
I'll see what I can do, but please note that this bug isn't new in any way in QEMU. This is the first report of it and we don't have an upstream fix yet, so fixing it upstream and then backporting will take a few days. If it's really blocking your work, consider backing out whatever made it apparent. (Some libvirt update? Additional test cases?)
Thank you Kevin, I disabled the failing test. The side effect is that, due to how osci is designed, we might have to disable all perl-sys-virt tests (actually make them non-gating).
Encounter Qemu crash when taken a VM snapshot with memory in oVirt.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
*** Bug 1798462 has been marked as a duplicate of this bug. ***
Can reproduced it on qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5:(libvirt version:libvirt-6.0.0-6.module+el8.2.0+5821+109ee33c.x86_64) # virsh create avocado-test_x86.xml Domain avocado-vt-vm1 created from avocado-test_x86.xml [root@hp-dl385g10-04 home]# virsh list Id Name State -------------------------------- 2 avocado-vt-vm1 running # virsh snapshot-create-as avocado-vt-vm1 s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1 Domain snapshot s1 created # virsh list Id Name State -------------------- #cat /var/log/libvirt/qemu/avocado-vt-vm1.log ... 2020-02-24 10:50:16.143+0000: Domain id=2 is tainted: custom-hypervisor-feature char device redirected to /dev/pts/6 (label charserial0) 2020-02-24T10:50:16.243809Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. 2020-02-24 10:50:45.114+0000: shutting down, reason=crashed
Test on qemu-kvm-4.2.0-11.module+el8.2.0+5837+4c1442ec(libvirt version:libvirt-6.0.0-6.module+el8.2.0+5821+109ee33c.x86_64), don't hit this issue any more. # virsh create avocado-test_x86.xml Domain avocado-vt-vm1 created from avocado-test_x86.xml # virsh list Id Name State -------------------------------- 1 avocado-vt-vm1 running # virsh snapshot-create-as avocado-vt-vm1 s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1 Domain snapshot s1 created # virsh list Id Name State -------------------------------- 1 avocado-vt-vm1 running Set bug's status to "Verified".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017