Bug 1508799

Summary: qemu-kvm core dumped when doing 'savevm/loadvm/delvm' for the second time
Product: Red Hat Enterprise Linux 7 Reporter: yilzhang
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: Ping Li <pingl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.5CC: awilliam, chayang, cliao, coli, hhan, juzhang, knoel, michen, mrezanin, ngu, peterx, pingl, quintela, qzhang, virt-maint, yilzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:46:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yilzhang 2017-11-02 09:15:37 UTC
Description of problem:
qemu-kvm core dumped when doing 'savevm/loadvm/delvm' for the second time

Version-Release number of selected component (if applicable):
Host kernel:  3.10.0-760.el7.ppc64le
Guest kernel: 3.10.0-760.el7.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7

How reproducible: 100%


Steps to Reproduce:
1. Boot up a KVM guest with data-plane enable
2. Create, load and delete internal snapshot one by one
(qemu) savevm s1
(qemu) loadvm s1
(qemu) delvm s1

3. Check that the above snapshot is not existed now
(qemu) info snapshots
There is no snapshot available.
4. Repeat step2


Actual results:
qemu-kvm core dumped when doing "loadvm s1"
[root@ibm-p8-rhevm-18 dataplane]# sh 86137.sh 
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) savevm s1
(qemu) loadvm s1
(qemu) delvm s1
(qemu) info snapshots
There is no snapshot available.
(qemu) 
(qemu) savevm s1
(qemu) loadvm s1
qemu-kvm: util/qemu-thread-posix.c:397: qemu_event_set: Assertion `ev->initialized' failed.
86137.sh: line 25: 73935 Aborted                 (core dumped) /usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 32768 -serial unix:/tmp/3dp-serial.log,server,nowait -nodefaults -rtc base=localtime,clock=host -boot menu=on -monitor stdio -monitor unix:/tmp/monitor1,server,nowait -qmp tcp:0:777,server,nowait -vnc :1 -device virtio-vga -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8a,bus=bridge1,addr=0x1e -object iothread,id=iothread0,poll-max-ns=16 -device virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 -drive file=/home/yilzhang/rhel7.5__3.10.0-760.el7.ppc64le.qcow2,media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=qcow2,werror=stop,rerror=stop -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 -drive file=/home/images/disk-image_1.qcow2,if=none,cache=none,id=drive_ddisk_2,format=qcow2,werror=stop,rerror=stop,readonly=on -device scsi-hd,drive=drive_ddisk_2,bus=scsi0.0,id=ddisk_2


Expected results:
No error



Additional info:
[New LWP 74050]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 32768 -serial unix:'.
Program terminated with signal 6, Aborted.
#0  0x00003fff9c49eff0 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le elfutils-libelf-0.168-8.el7.ppc64le elfutils-libs-0.168-8.el7.ppc64le glib2-2.50.3-3.el7.ppc64le glibc-2.17-196.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-9.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-8.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-10.el7.ppc64le libcurl-7.29.0-42.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-16.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-13-7.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-4.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-13-7.el7.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-16.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7_3.ppc64le nss-3.28.4-8.el7.ppc64le nss-softokn-freebl-3.28.3-6.el7.ppc64le nss-util-3.28.4-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-5.el7.ppc64le openssl-libs-1.0.2k-8.el7.ppc64le p11-kit-0.23.5-3.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-42.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  0x00003fff9c49eff0 in raise () from /lib64/libc.so.6
#1  0x00003fff9c4a136c in abort () from /lib64/libc.so.6
#2  0x00003fff9c494c44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fff9c494d34 in __assert_fail () from /lib64/libc.so.6
#4  0x000000004503e2a8 in qemu_event_set (ev=<optimized out>) at util/qemu-thread-posix.c:397
#5  0x0000000044ec1b9c in qemu_loadvm_state (f=0x1001aa49000) at migration/savevm.c:2035
#6  0x0000000044ec2904 in load_snapshot (name=<optimized out>, errp=0x3fffd3ab5560) at migration/savevm.c:2332
#7  0x0000000044e316c8 in hmp_loadvm (mon=<optimized out>, qdict=<optimized out>) at hmp.c:1321
#8  0x0000000044c7ebd8 in handle_hmp_command (mon=0x10019722f40, cmdline=0x10019fc0007 "s1") at /usr/src/debug/qemu-2.10.0/monitor.c:3137
#9  0x0000000044c80d7c in monitor_command_cb (opaque=0x10019722f40, cmdline=<optimized out>, readline_opaque=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3940
#10 0x0000000045053a2c in readline_handle_byte (rs=0x10019fc0000, ch=<optimized out>) at util/readline.c:393
#11 0x0000000044c7eeb8 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3923
#12 0x0000000044fb659c in qemu_chr_be_write_impl (len=<optimized out>, buf=<optimized out>, s=<optimized out>) at chardev/char.c:167
#13 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, len=<optimized out>) at chardev/char.c:179
#14 0x0000000044fb8838 in fd_chr_read (chan=0x100197d0840, cond=<optimized out>, opaque=0x10019626ea0) at chardev/char-fd.c:66
#15 0x0000000044fd5734 in qio_channel_fd_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at io/channel-watch.c:84
#16 0x00003fff9c9e3ab0 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#17 0x0000000045039784 in glib_pollfds_poll () at util/main-loop.c:213
#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#19 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:515
#20 0x0000000044c18038 in main_loop () at vl.c:1917
#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4804
(gdb) bt full
#0  0x00003fff9c49eff0 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00003fff9c4a136c in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00003fff9c494c44 in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00003fff9c494d34 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x000000004503e2a8 in qemu_event_set (ev=<optimized out>) at util/qemu-thread-posix.c:397
        __PRETTY_FUNCTION__ = "qemu_event_set"
#5  0x0000000044ec1b9c in qemu_loadvm_state (f=0x1001aa49000) at migration/savevm.c:2035
        mis = 0x45285388 <mis_current.32409>
        local_err = 0x0
        v = <optimized out>
        ret = 0
#6  0x0000000044ec2904 in load_snapshot (name=<optimized out>, errp=0x3fffd3ab5560) at migration/savevm.c:2332
        bs = 0x0
        bs_vm_state = 0x10019770000
        sn = {id_str = "1", '\000' <repeats 126 times>, name = "s1", '\000' <repeats 253 times>, vm_state_size = 1663964227, date_sec = 1509613329, date_nsec = 818940000, vm_clock_nsec = 57216965684}
        ret = 0
        aio_context = 0x10019671a40
        mis = 0x45285388 <mis_current.32409>
        __func__ = "load_snapshot"
#7  0x0000000044e316c8 in hmp_loadvm (mon=<optimized out>, qdict=<optimized out>) at hmp.c:1321
        saved_vm_running = 1
        name = 0x1001959e1e8 "s1"
        err = 0x0
#8  0x0000000044c7ebd8 in handle_hmp_command (mon=0x10019722f40, cmdline=0x10019fc0007 "s1") at /usr/src/debug/qemu-2.10.0/monitor.c:3137
        qdict = 0x10019c1c600
        cmd = 0x452123d0 <mon_cmds+2688>
#9  0x0000000044c80d7c in monitor_command_cb (opaque=0x10019722f40, cmdline=<optimized out>, readline_opaque=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3940
        mon = 0x10019722f40
#10 0x0000000045053a2c in readline_handle_byte (rs=0x10019fc0000, ch=<optimized out>) at util/readline.c:393
No locals.
#11 0x0000000044c7eeb8 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3923
        old_mon = 0x0
        i = <optimized out>
#12 0x0000000044fb659c in qemu_chr_be_write_impl (len=<optimized out>, buf=<optimized out>, s=<optimized out>) at chardev/char.c:167
        be = <optimized out>
#13 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, len=<optimized out>) at chardev/char.c:179
No locals.
#14 0x0000000044fb8838 in fd_chr_read (chan=0x100197d0840, cond=<optimized out>, opaque=0x10019626ea0) at chardev/char-fd.c:66
        chr = 0x10019626ea0
        __func__ = "fd_chr_read"
        s = <optimized out>
        len = <optimized out>
        buf = "\r", '\000' <repeats 13 times>, "\370\000\000\000\000\000\000\000\000\000\377\377\377\377\000\000\000\004", '\000' <repeats 12 times>, "\377\377\377\377\000\000t\033\000\001\000\000\344\000\000\000\000\001\000\000\000\000b\032\000\001\000\000\000\000\000\000\001", '\000' <repeats 13 times>, " \000\000\000\000\000\000\000\000\000\000\001\000\000\002\000\000\000\000\000\000\000\200\000\000\000\000\000 \000\200\000\000\000\000\000\000\000\200\000\000\000\003\000\000\000\000\000!\000\000\000\000\000\000\000!\000\000\000\000\000\000\000\001", '\000' <repeats 20 times>, "\001\000\000\021\021\000\000\020\004\000\000\000\b", '\000' <repeats 15 times>...
        ret = <optimized out>
---Type <return> to continue, or q <return> to quit---
#15 0x0000000044fd5734 in qio_channel_fd_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at io/channel-watch.c:84
        func = <optimized out>
        ssource = <optimized out>
#16 0x00003fff9c9e3ab0 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
No symbol table info available.
#17 0x0000000045039784 in glib_pollfds_poll () at util/main-loop.c:213
        context = 0x100197c0790
        pfds = <optimized out>
#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
        context = 0x100197c0790
        ret = <optimized out>
        spin_counter = 0
#19 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:515
        ret = 1
        timeout = 4294967295
        timeout_ns = 2447055654
#20 0x0000000044c18038 in main_loop () at vl.c:1917
No locals.
#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4804
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = 0
        initrd_filename = <optimized out>
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = <optimized out>
        cyls = 0
        heads = 0
        secs = 0
        translation = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
        olist = <optimized out>
        optind = 40
        optarg = 0x3fffd3abf3fc "scsi-hd,drive=drive_ddisk_2,bus=scsi0.0,id=ddisk_2"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = 0x0
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
        pid_file = <optimized out>
        incoming = 0x0
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
---Type <return> to continue, or q <return> to quit---
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x3fffd3ab6b08}
        __func__ = "main"
        __FUNCTION__ = "main"
(gdb)

Comment 2 yilzhang 2017-11-02 09:18:55 UTC
1. This bug can be reproduced without data-plane
2. x86 also has this bug

Comment 3 Dr. David Alan Gilbert 2017-11-02 13:31:22 UTC
We need my upstream 5089e1862fe

Comment 4 Miroslav Rezanina 2017-11-08 17:59:19 UTC
Fix included in qemu-kvm-rhev-2.10.0-5.el7

Comment 6 Ping Li 2017-11-10 10:46:31 UTC
Do savevm, loadvm and delvm 50 times, Cannot reproduce the issue. Set the bug status as verified.

Packages tested:
kernel-3.10.0-776.el7
qemu-kvm-rhev-2.10.0-5.el7

Test steps:
1. Boot a guest with below cml
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/pingl/monitor-qmpmonitor1-20171109-061915-yYUvh9mI,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/pingl/monitor-catch_monitor-20171109-061915-yYUvh9mI,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idTefwYH  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/pingl/serial-serial0-20171109-061915-yYUvh9mI,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20171109-061915-yYUvh9mI,path=/var/tmp/pingl/seabios-20171109-061915-yYUvh9mI,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20171109-061915-yYUvh9mI,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/tests/diskfile/rhel75.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x3 \
    -device virtio-net-pci,mac=9a:7f:80:81:82:83,id=idpY8FeM,vectors=4,netdev=idKS30aC,bus=pci.0,addr=0x4  \
    -netdev tap,id=idKS30aC \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G5',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -monitor unix:/var/tmp/pingl/monitor1,server,nowait \

2. run stress test for internal snapshot with below script.
# cat test.sh 
#!/bin/bash
GUEST_IP=10.73.75.186
error=0
for i in {1..50};do
    echo run time:$i
    echo savevm s1 | nc -U /var/tmp/pingl/monitor1
    ssh $GUEST_IP dd if=/dev/urandom of=~/$i bs=50M count=1 2>/dev/null
    echo loadvm s1 | nc -U /var/tmp/pingl/monitor1
    ssh $GUEST_IP cat /etc/hosts
    if [ $? -ne 0 ] ; then
         echo Error found >> ./err
         error=`expr $error + 1`
    fi
    echo delvm s1 | nc -U /var/tmp/pingl/monitor1
done
echo The error numbers is: $error

(qemu) run time:50
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) savevm s1
(qemu) QEMU 2.10.0 monitor - type 'help' for more information
(qemu) loadvm s1
(qemu) 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) delvm s1
(qemu) The error numbers is: 0

Comment 7 Dr. David Alan Gilbert 2017-11-14 12:31:43 UTC
*** Bug 1507739 has been marked as a duplicate of this bug. ***

Comment 10 Adam Williamson 2018-03-22 20:05:19 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1531048 is this same bug reported against Fedora. It's causing me some inconvenience in Fedora testing; can we please have it fixed for Fedora (26 / 27 / 28 / Rawhide) too? Thanks.

Comment 12 errata-xmlrpc 2018-04-11 00:46:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104