Bug 1781637 - qemu crashed when do mem and disk snapshot
Summary: qemu crashed when do mem and disk snapshot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks: 1798462
TreeView+ depends on / blocked
 
Reported: 2019-12-10 10:50 UTC by yisun
Modified: 2020-05-05 09:52 UTC (History)
16 users (show)

Fixed In Version: qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:52:16 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Domain XMLs, ARGVs, libvirtd log, detailed backtrace (16.12 KB, application/gzip)
2019-12-12 04:10 UTC, Han Han
no flags Details

Description yisun 2019-12-10 10:50:21 UTC
Description of problem:
qemu crashed when do external mem and disk snapshot

Version-Release number of selected component (if applicable):
qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64
libvirt-5.10.0-1.module+el8.2.0+5135+ed3b2489.x86_64

How reproducible:
100%

Steps to Reproduce:
1. having test vm avocado-vt-vm1 
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list
 Id   Name             State
--------------------------------
 1    gls              running
 10   avocado-vt-vm1   running

2. prepare a snapshot xml as follow:
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# cat snap.xml 
<domainsnapshot>
  <name>local-external</name>
  <memory snapshot='external' file='/tmp/test_api_snapshot.mem'/>
  <disk name='vda' snapshot='external'>
    <source file='/var/lib/libvirt/images/test_api_snapshot.disk'/>
  </disk>
</domainsnapshot>

3. do snapshot
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh snapshot-create avocado-vt-vm1 --xmlfile snap.xml 
Domain snapshot local-external created from 'snap.xml'

4. check running vm (avocado-vt-vm1 will be still running for a while)
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list
 Id   Name             State
--------------------------------
 1    gls              running
 10   avocado-vt-vm1   running

...
Crash happens after few seconds

(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# virsh list
 Id   Name   State
----------------------
 1    gls    running


5. libvirt qemu log as follow:
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log 
...
2019-12-10 10:05:59.294+0000: 998: debug : virFileClose:113 : Closed fd 46
2019-12-10 10:05:59.294+0000: 998: debug : virFileClose:113 : Closed fd 47
2019-12-10 10:05:59.294+0000: 998: debug : virCommandHandshakeChild:412 : Handshake with parent is done
char device redirected to /dev/pts/1 (label charserial0)
qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed.
2019-12-10 10:06:33.876+0000: shutting down, reason=crashed

6. gdb backtrace info of the qemu process as follow:
# gdb -p <pid_of_qemu_process>
...
Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e43fff700 (LWP 3591)]
0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e50fd2700 (LWP 3590)]
0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e43fff700 (LWP 3591)]
0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e50fd2700 (LWP 3590)]
0x00007f5e5a5298ed in __lll_lock_wait () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.
[New Thread 0x7f5e52116700 (LWP 3623)]

Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e43fff700 (LWP 3591)]
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7f5e52116700 (LWP 3623) exited]
[New Thread 0x7f5e52116700 (LWP 3626)]

Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e50fd2700 (LWP 3590)]
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e43fff700 (LWP 3591)]
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7f5e52116700 (LWP 3626) exited]

Thread 5 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e50fd2700 (LWP 3590)]
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.

Thread 6 "CPU 1/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f5e43fff700 (LWP 3591)]
0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
(gdb) c
Continuing.
[New Thread 0x7f5e52116700 (LWP 3653)]
[New Thread 0x7f5decd92700 (LWP 3654)]
[New Thread 0x7f5ddffff700 (LWP 3655)]
[New Thread 0x7f5ddf7fe700 (LWP 3656)]
[New Thread 0x7f5ddeffd700 (LWP 3657)]
[New Thread 0x7f5dde7fc700 (LWP 3658)]
[New Thread 0x7f5dddffb700 (LWP 3659)]
[New Thread 0x7f5ddd7fa700 (LWP 3660)]

Thread 1 "qemu-kvm" received signal SIGABRT, Aborted.
[Switching to Thread 0x7f5e5f5a1f00 (LWP 3569)]
0x00007f5e5a18d70f in raise () from target:/lib64/libc.so.6
(gdb) t a a bt

Thread 18 (Thread 0x7f5ddd7fa700 (LWP 3660)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 17 (Thread 0x7f5dddffb700 (LWP 3659)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 16 (Thread 0x7f5dde7fc700 (LWP 3658)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 15 (Thread 0x7f5ddeffd700 (LWP 3657)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 14 (Thread 0x7f5ddf7fe700 (LWP 3656)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 13 (Thread 0x7f5ddffff700 (LWP 3655)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--

Thread 12 (Thread 0x7f5decd92700 (LWP 3654)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 11 (Thread 0x7f5e52116700 (LWP 3653)):
#0  0x00007f5e5a529082 in do_futex_wait () from target:/lib64/libpthread.so.0
#1  0x00007f5e5a529193 in __new_sem_wait_slow () from target:/lib64/libpthread.so.0
#2  0x0000556008c9d4ff in qemu_sem_timedwait ()
#3  0x0000556008c98424 in worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 8 (Thread 0x7f5e421ff700 (LWP 3594)):
#0  0x00007f5e5a52648c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x0000556008c9d28d in qemu_cond_wait_impl ()
#2  0x0000556008bc1f21 in vnc_worker_thread_loop ()
#3  0x0000556008bc2850 in vnc_worker_thread ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 7 (Thread 0x7f5e42a26700 (LWP 3593)):
#0  0x00007f5e5a246f21 in poll () from target:/lib64/libc.so.6
#1  0x00007f5e5ea109b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0
#2  0x00007f5e5ea10d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x00007f5e5c3f847b in red_worker_main () from target:/lib64/libspice-server.so.1
#4  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#5  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 6 (Thread 0x7f5e43fff700 (LWP 3591)):
#0  0x00007f5e5a24887b in ioctl () from target:/lib64/libc.so.6
#1  0x000055600898ec39 in kvm_vcpu_ioctl ()
#2  0x000055600898ecf9 in kvm_cpu_exec ()
#3  0x0000556008973d5e in qemu_kvm_cpu_thread_fn ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 5 (Thread 0x7f5e50fd2700 (LWP 3590)):
#0  0x00007f5e5a24887b in ioctl () from target:/lib64/libc.so.6
#1  0x000055600898ec39 in kvm_vcpu_ioctl ()
#2  0x000055600898ecf9 in kvm_cpu_exec ()
#3  0x0000556008973d5e in qemu_kvm_cpu_thread_fn ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

--Type <RET> for more, q to quit, c to continue without paging--
Thread 4 (Thread 0x7f5e517d3700 (LWP 3588)):
#0  0x00007f5e5a246f21 in poll () from target:/lib64/libc.so.6
#1  0x00007f5e5ea109b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0
#2  0x00007f5e5ea10d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x0000556008a76db1 in iothread_run ()
#4  0x0000556008c9ced4 in qemu_thread_start ()
#5  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 2 (Thread 0x7f5e5313d700 (LWP 3584)):
#0  0x00007f5e5a24c6ed in syscall () from target:/lib64/libc.so.6
#1  0x0000556008c9d73f in qemu_event_wait ()
#2  0x0000556008caf962 in call_rcu_thread ()
#3  0x0000556008c9ced4 in qemu_thread_start ()
#4  0x00007f5e5a5202de in start_thread () from target:/lib64/libpthread.so.0
#5  0x00007f5e5a251e83 in clone () from target:/lib64/libc.so.6

Thread 1 (Thread 0x7f5e5f5a1f00 (LWP 3569)):
#0  0x00007f5e5a18d70f in raise () from target:/lib64/libc.so.6
#1  0x00007f5e5a177b25 in abort () from target:/lib64/libc.so.6
#2  0x00007f5e5a1779f9 in __assert_fail_base.cold.0 () from target:/lib64/libc.so.6
#3  0x00007f5e5a185cc6 in __assert_fail () from target:/lib64/libc.so.6
#4  0x0000556008c15605 in bdrv_co_write_req_prepare.isra.14.part ()
#5  0x0000556008c19efe in bdrv_aligned_pwritev ()
#6  0x0000556008c1a45b in bdrv_co_pwritev_part ()
#7  0x0000556008c07016 in blk_co_pwritev_part ()
#8  0x0000556008c070c7 in blk_aio_write_entry ()
#9  0x0000556008cb0ed3 in coroutine_trampoline ()
#10 0x00007f5e5a1a32e0 in ?? () from target:/lib64/libc.so.6
#11 0x00007ffc6356de00 in ?? ()
#12 0x0000000000000000 in ?? ()

Comment 2 yisun 2019-12-10 10:59:52 UTC
qemu process as follow:
(.libvirt-ci-venv-ci-runtest-AglsHs) [root@hp-dl385g8-01 tmp]# ps -ef | grep avocado-vt-vm1
qemu      2797     1 10 05:59 ?        00:00:00 /usr/libexec/qemu-kvm -name guest=avocado-vt-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-avocado-vt-vm1/master-key.aes -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off -cpu Opteron_G5,vme=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc-adjust=on,bmi1=on,arch-capabilities=on,ssbd=on,mmxext=on,fxsr-opt=on,cmp-legacy=on,cr8legacy=on,osvw=on,perfctr-core=on,ibpb=on,amd-ssbd=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,svm=off,vmx=off -m 1024 -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 5f4b696f-3fb4-48b9-8b58-6e1020064005 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -netdev tap,fd=42,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:02:46:85,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=44,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:1 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Comment 4 aihua liang 2019-12-11 05:36:11 UTC
Hi, test with snapshot on disk, don't hit this issue.

Test Env:
   qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64
   kernel version:4.18.0-160.el8.x86_64

Test steps:
 1.Start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191210-025743-Q0JzJpKT,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191210-025743-Q0JzJpKT,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idDU2Q2E \
    -chardev socket,server,path=/var/tmp/serial-serial0-20191210-025743-Q0JzJpKT,id=chardev_serial0,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20191210-025743-Q0JzJpKT,path=/var/tmp/seabios-20191210-025743-Q0JzJpKT,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20191210-025743-Q0JzJpKT,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0,multifunction=on \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x2.0x1,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,iothread=iothread0,bus=pcie.0-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x2.0x2,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-4,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x2.0x3,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:9b:1d:13:61:86,id=idKg9AzR,netdev=idxDM2m8,bus=pcie.0-root-port-5,addr=0x0  \
    -netdev tap,id=idxDM2m8,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x2.0x4,bus=pcie.0 \
    -qmp tcp:0:3000,server,nowait \

2.Create snapshot target
   {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}}
{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
{"return": {}}
{'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}}
{"return": {}}
{'execute':'job-dismiss','arguments':{'id':'job1'}}
{"return": {}}
{'execute':'job-dismiss','arguments':{'id':'job2'}}
{"return": {}}

3.Do snapshot on drive_image1
  {"execute":"blockdev-snapshot","arguments":{"node":"drive_image1","overlay":"sn1"}}
{"return": {}}

After step3, qemu not crash.

Comment 5 Han Han 2019-12-12 04:10:25 UTC
Created attachment 1644258 [details]
Domain XMLs, ARGVs, libvirtd log, detailed backtrace

I think it is caused by blockdev features.
Evidence:
For the same steps, reproduced when blockdev enabled, not reproduced whe blockdev disabled.

Version: libvirt-5.10.0-1.module+el8.2.0+5040+bd433686.x86_64 qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64

Steps:
1. Start an VM
2. Create a memory&disk snapshot
# virsh snapshot-create-as pc s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1

Then vm gets SIGABRT.
# abrt-cli ls
id 266e34a039bca1273d0c2a4e1af8db1a28fba6ff
reason:         _nl_load_domain.cold.0(): qemu-kvm killed by SIGABRT
time:           Wed 11 Dec 2019 10:38:37 PM EST
cmdline:        /usr/libexec/qemu-kvm -name guest=pc,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-pc/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Opteron_G3 -m 1024 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c0fae7ee-7f4e-4cb0-aca4-25fa945a3f83 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=37,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -blockdev '{\"driver\":\"file\",\"filename\":\"/var/lib/libvirt/images/RHEL-8.2-x86_64-latest.qcow2\",\"node-name\":\"libvirt-2-storage\",\"auto-read-only\":true,\"discard\":\"unmap\"}' -blockdev '{\"node-name\":\"libvirt-2-format\",\"read-only\":false,\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\",\"backing\":null}' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=libvirt-2-format,id=virtio-disk0,bootindex=1,serial=SB -blockdev '{\"driver\":\"file\",\"filename\":\"/tmp/copy.img1\",\"aio\":\"native\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' -blockdev '{\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"discard\":\"ignore\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":\"libvirt-1-storage\"}' -device virtio-blk-pci,ioeventfd=on,scsi=off,bus=pci.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk1,write-cache=off,werror=report,rerror=report -netdev tap,fd=39,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:09:f8:0c,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=13,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
package:        15:qemu-kvm-core-4.2.0-1.module+el8.2.0+4793+b09dd2fb
uid:            107 (qemu)
count:          1
Directory:      /var/spool/abrt/ccpp-2019-12-11-22:38:37-6076
Run 'abrt-cli report /var/spool/abrt/ccpp-2019-12-11-22:38:37-6076' for creating a case in Red Hat Customer Portal


Additional info:
Not reproduced when memory only snapshot or disk-only snapshot

Not reproduced when blockdev diskabled:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
...
  <qemu:capabilities>
    <qemu:add capability='drive'/>
    <qemu:del capability='blockdev'/>
  </qemu:capabilities>
</domain>

Not reproduced on RHEL8.1.1-AV(libvirt-5.6.0-10.module+el8.1.1+5131+a6fe889c.x86_64 qemu-kvm-4.1.0-18.module+el8.1.1+5150+45ce6c40.x86_64)

Comment 6 Han Han 2019-12-12 06:09:33 UTC
It can be reproduced on q35 machine type:

qemu log:
/usr/libexec/qemu-kvm \
-name guest=q35,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-q35/master-key.aes \
-machine pc-q35-rhel8.1.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \
-cpu Opteron_G3 \
-m 1024 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 516b83cb-bd68-47cf-b577-9ed07741601e \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=37,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x5.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x5 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x5.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x5.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pcie.0,addr=0x6 \
-device virtio-serial-pci,id=virtio-serial0,bus=pcie.0,addr=0x7 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/RHEL-8.2-x86_64-latest-clone.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pcie.0,addr=0x9,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 \
-netdev tap,fd=39,id=hostnet0 \
-device e1000,netdev=hostnet0,id=net0,mac=52:54:00:87:15:bd,bus=pcie.0,addr=0x3 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=13,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x2 \
-device intel-hda,id=sound0,bus=pcie.0,addr=0x4 \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-chardev spicevmc,id=charredir0,name=usbredir \
-device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 \
-chardev spicevmc,id=charredir1,name=usbredir \
-device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 \
-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/6 (label charserial0)
qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed.
2019-12-12 06:06:39.735+0000: shutting down, reason=crashed

Comment 7 yisun 2019-12-13 06:20:56 UTC
(In reply to aihua liang from comment #4)
> Hi, test with snapshot on disk, don't hit this issue.
> 
> Test Env:
>    qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64
>    kernel version:4.18.0-160.el8.x86_64
> 


Hi Aihua,
I found this crash happens after the snapshot but within a uncertain period. Today when I try to reproduce it, it took around 1 minutes to crash, as follow:

[root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2 --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2
Domain snapshot s2 created
[root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh domstate avocado-vt-vm1; done
Time:1
running

Time:2
running

Time:3
running

...

Time:54
running

Time:55
shut off
...
[root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log 
qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed.
2019-12-13 06:14:06.943+0000: shutting down, reason=crashed


Could you pls double check if the qemu process terminated with a longer period?

Comment 8 aihua liang 2019-12-13 08:06:45 UTC
(In reply to yisun from comment #7)
> (In reply to aihua liang from comment #4)
> > Hi, test with snapshot on disk, don't hit this issue.
> > 
> > Test Env:
> >    qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64
> >    kernel version:4.18.0-160.el8.x86_64
> > 
> 
> 
> Hi Aihua,
> I found this crash happens after the snapshot but within a uncertain period.
> Today when I try to reproduce it, it took around 1 minutes to crash, as
> follow:
> 
> [root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2
> --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2
> Domain snapshot s2 created
> [root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh
> domstate avocado-vt-vm1; done
> Time:1
> running
> 
> Time:2
> running
> 
> Time:3
> running
> 
> ...
> 
> Time:54
> running
> 
> Time:55
> shut off
> ...
> [root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log 
> qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm
> & BLK_PERM_WRITE' failed.
> 2019-12-13 06:14:06.943+0000: shutting down, reason=crashed
> 
> 
> Could you pls double check if the qemu process terminated with a longer
> period?


Hi, YiSun

  Test again, and wait for 2 minutes later, still can't reproduce:
  After migration: {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":
"exec:gzip -c > STATEFILE.gz"},"id":"libvirt-16"} finished, vm enter "paused (postmigrate)" status successfully.
  

BR,
Aliang

Comment 9 yisun 2019-12-13 08:17:21 UTC
(In reply to aihua liang from comment #8)
> (In reply to yisun from comment #7)
> > (In reply to aihua liang from comment #4)
> > > Hi, test with snapshot on disk, don't hit this issue.
> > > 
> > > Test Env:
> > >    qemu-kvm version:qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64
> > >    kernel version:4.18.0-160.el8.x86_64
> > > 
> > 
> > 
> > Hi Aihua,
> > I found this crash happens after the snapshot but within a uncertain period.
> > Today when I try to reproduce it, it took around 1 minutes to crash, as
> > follow:
> > 
> > [root@libvirt-rhel-8 ~]# virsh snapshot-create-as avocado-vt-vm1 s2
> > --memspec file=/var/lib/avocado/data/avocado-vt/images/avv-mem.s2
> > Domain snapshot s2 created
> > [root@libvirt-rhel-8 ~]# for i in {1..1000}; do sleep 1; echo Time:$i; virsh
> > domstate avocado-vt-vm1; done
> > Time:1
> > running
> > 
> > Time:2
> > running
> > 
> > Time:3
> > running
> > 
> > ...
> > 
> > Time:54
> > running
> > 
> > Time:55
> > shut off
> > ...
> > [root@libvirt-rhel-8 ~]# cat /var/log/libvirt/qemu/avocado-vt-vm1.log 
> > qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm
> > & BLK_PERM_WRITE' failed.
> > 2019-12-13 06:14:06.943+0000: shutting down, reason=crashed
> > 
> > 
> > Could you pls double check if the qemu process terminated with a longer
> > period?
> 
> 
> Hi, YiSun
> 
>   Test again, and wait for 2 minutes later, still can't reproduce:
>   After migration:
> {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,
> "uri":
> "exec:gzip -c > STATEFILE.gz"},"id":"libvirt-16"} finished, vm enter "paused
> (postmigrate)" status successfully.
>   
> 
> BR,
> Aliang

thx for the info
Hi Admear, 
Since from pure qemu cmd line, the problem is not reproduced yet. Could you pls have a look about the backtrace or libvirtd log to see if any clue can be found? thanks

Comment 11 Kevin Wolf 2019-12-16 10:15:14 UTC
Hm, an idea, yes. After migration completes, all block devices are inactive, i.e. ownership is released. However, if you do blockdev-add in this state, the new node will be active, which breaks the assumption that all nodes are in the same active/inactive state. Taking a snapshot means that we now have an active overlay over an inactive backing file. When doing 'cont', we recursively activate all images trees from their root. However, since the root is already active, we skip the activation of the whole subtree, so the backing file stays inactive and the next write to it causes an assertion failure.

Using blockdev-snapshot-sync rather than blockdev-snapshot, the flags are copied from the old top layer, so bdrv_open() for the new overlay is called with BDRV_O_INACTIVE. I don't think bdrv_open() was ever supposed to be called with that flag, but it could be an explanation why the case accidentally worked with libvirt in -drive mode.

Comment 12 Danilo de Paula 2019-12-16 12:53:47 UTC
For the record: This is BZ prevents the whole virt module of passing gating without requiring manual intervention.
Please consider raising the priority of this.

Comment 13 Kevin Wolf 2019-12-16 15:45:13 UTC
I'll see what I can do, but please note that this bug isn't new in any way in QEMU. This is the first report of it and we don't have an upstream fix yet, so fixing it upstream and then backporting will take a few days.

If it's really blocking your work, consider backing out whatever made it apparent. (Some libvirt update? Additional test cases?)

Comment 14 Danilo de Paula 2019-12-16 17:10:57 UTC
Thank you Kevin,

I disabled the failing test.
The side effect is that, due to how osci is designed, we might have to disable all perl-sys-virt tests (actually make them non-gating).

Comment 15 Eyal Shenitzky 2020-01-01 12:19:01 UTC
Encounter Qemu crash when taken a VM snapshot with memory in oVirt.

Comment 16 Ademar Reis 2020-02-05 23:10:30 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 19 Evelina Shames 2020-02-11 13:26:30 UTC
*** Bug 1798462 has been marked as a duplicate of this bug. ***

Comment 24 aihua liang 2020-02-24 10:53:34 UTC
Can reproduced it on qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5:(libvirt version:libvirt-6.0.0-6.module+el8.2.0+5821+109ee33c.x86_64)
 # virsh create avocado-test_x86.xml 
Domain avocado-vt-vm1 created from avocado-test_x86.xml

[root@hp-dl385g10-04 home]# virsh list 
 Id   Name             State
--------------------------------
 2    avocado-vt-vm1   running

# virsh snapshot-create-as avocado-vt-vm1 s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1
Domain snapshot s1 created
# virsh list 
 Id   Name   State
--------------------

#cat /var/log/libvirt/qemu/avocado-vt-vm1.log
...
2020-02-24 10:50:16.143+0000: Domain id=2 is tainted: custom-hypervisor-feature
char device redirected to /dev/pts/6 (label charserial0)
2020-02-24T10:50:16.243809Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
qemu-kvm: block/io.c:1879: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed.
2020-02-24 10:50:45.114+0000: shutting down, reason=crashed

Comment 25 aihua liang 2020-02-24 11:02:02 UTC
Test on qemu-kvm-4.2.0-11.module+el8.2.0+5837+4c1442ec(libvirt version:libvirt-6.0.0-6.module+el8.2.0+5821+109ee33c.x86_64), don't hit this issue any more.

# virsh create avocado-test_x86.xml 
Domain avocado-vt-vm1 created from avocado-test_x86.xml

# virsh list 
 Id   Name             State
--------------------------------
 1    avocado-vt-vm1   running

# virsh snapshot-create-as avocado-vt-vm1 s1 --no-metadata --diskspec vda,file=/tmp/vda.s1 --memspec /tmp/mem.s1
Domain snapshot s1 created
# virsh list 
 Id   Name             State
--------------------------------
 1    avocado-vt-vm1   running


Set bug's status to "Verified".

Comment 27 errata-xmlrpc 2020-05-05 09:52:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.