Bug 1772774 - qemu-kvm core dump during migration+reboot ( Assertion `mem->dirty_bmap' failed )
Summary: qemu-kvm core dump during migration+reboot ( Assertion `mem->dirty_bmap' fail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: x86_64
OS: All
high
high
Target Milestone: rc
: 8.0
Assignee: Dr. David Alan Gilbert
QA Contact: Li Xiaohui
URL:
Whiteboard:
: 1771032 (view as bug list)
Depends On:
Blocks: 1677408 1771032
TreeView+ depends on / blocked
 
Reported: 2019-11-15 06:51 UTC by FuXiangChun
Modified: 2020-05-05 09:51 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:50:55 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
1-qemu-core-dump-when-guest-restarting (102.87 KB, image/png)
2019-11-18 02:19 UTC, Li Xiaohui
no flags Details
2-migration-finish-after-guest-restarting (199.70 KB, image/png)
2019-11-18 02:21 UTC, Li Xiaohui
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:51:38 UTC

Description FuXiangChun 2019-11-15 06:51:36 UTC
Description of problem:
Qemu-kvm core dump in source host during local migration.

Version-Release number of selected component (if applicable):
qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64
4.18.0-148.el8.x86_64

How reproducible:
always

Steps to Reproduce:
1.Boot win2019 guest in source host

/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine q35 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_lwn8jbp7/monitor-qmpmonitor1-20191114-221426-twXb,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_lwn8jbp7/monitor-catch_monitor-20191114-22142nyqqXb,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idoGbZbG \
-chardev socket,path=/var/tmp/avocado_lwn8jbp7/serial-serial0-20191114-221426-twnyqqXb,nowait,server,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20191114-221426-twnyqqXb,path=/var/tmp/avocado_lwn8jbp7/seabios-20191114-26-twnyqqXb,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20191114-221426-twnyqqXb,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:f8:a2:7c:cc:12,id=idhBfcJy,netdev=id2ubJgc,bus=pcie.0-root-port-4,addr=0x0  \
-netdev tap,id=id2ubJgc,vhost=on \
-m 7168  \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
-device scsi-cd,id=cd1,drive=drive_cd1 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=localtime,clock=host,driftfix=slew \
-boot order=cdn,once=c,menu=off,strict=off \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.secboot.fd \
-drive if=pflash,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2.fd \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-monitor stdio \

2. Boot another qemu-kvm process with -incoming in the same host.
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine q35  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_lwn8jbp7/monitor-qmpmonitor1-20191114-221543-qzbW,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_lwn8jbp7/monitor-catch_monitor-20191114-22154NxOxbW,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idXFImXB \
-chardev socket,path=/var/tmp/avocado_lwn8jbp7/serial-serial0-20191114-221543-qzNxOxbW,nowait,server,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0  \
-chardev socket,id=seabioslog_id_20191114-221543-qzNxOxbW,path=/var/tmp/avocado_lwn8jbp7/seabios-20191114-23-qzNxOxbW,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20191114-221543-qzNxOxbW,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:f8:a2:7c:cc:12,id=id2VGia9,netdev=ideZfhND,bus=pcie.0-root-port-4,addr=0x0 \
-netdev tap,id=ideZfhND,vhost=on \
-m 7168  \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
-device scsi-cd,id=cd1,drive=drive_cd1 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :1 \
-rtc base=localtime,clock=host,driftfix=slew \
-boot order=cdn,once=c,menu=off,strict=off \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.secboot.fd \
-drive if=pflash,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2.fd \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-incoming tcp:0:5984 \
-monitor stdio \


3. send qmp migration command 
{"execute": "migrate", "arguments": {"uri": "tcp:localhost:5984", "blk": false, "inc": false}, "id": "xzWgX5cb"}

Actual results:
(qemu) qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/accel/kvm/kvm-all.c:673: kvm_physical_log_clear: Assertion `mem->dirty_bmap' failed.
cli: line 38: 30616 Aborted                 (core dumped) /usr/libexec/qemu-kvm

(gdb) bt full
#0  0x00007fe95aa8782f in raise () at /lib64/libc.so.6
#1  0x00007fe95aa71c45 in abort () at /lib64/libc.so.6
#2  0x00007fe95aa71b19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007fe95aa7fde6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x0000556a8e13fa68 in kvm_physical_log_clear
    (section=0x7fe7524a34f0, section=0x7fe7524a34f0, kml=0x556a8f8d1d98)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/accel/kvm/kvm-all.c:673
        d = 
            {slot = 8, num_pages = 16, first_page = 0, {dirty_bitmap = 0x7fe74c080670, padding2 = 140631389767280}}
        end = <optimized out>
        start_delta = 0
        size = 16777216
        bmap_start = <optimized out>
        bmap_npages = 4096
        bmap_clear = 0x0
        ret = <optimized out>
        s = <optimized out>
        start = <optimized out>
        mem = 0x556a8f8d2000
        i = <optimized out>
        __PRETTY_FUNCTION__ = "kvm_physical_log_clear"
        __func__ = "kvm_physical_log_clear"
        kml = 0x556a8f8d1d98
        r = <optimized out>
        __func__ = "kvm_log_clear"
        print_once_ = false
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x0000556a8e13fa68 in kvm_log_clear (listener=0x556a8f8d1d98, section=0x7fe7524a34f0)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/accel/kvm/kvm-all.c:1064
        kml = 0x556a8f8d1d98
        r = <optimized out>
        __func__ = "kvm_log_clear"
        print_once_ = false
#6  0x0000556a8e134922 in memory_region_clear_dirty_bitmap (mr=0x556a903093d0, start=0, len=1073741824)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/memory.c:2124
        mrs = 
          {mr = 0x556a903093d0, fv = 0x7fe92824afb0, offset_within_region = 0, size = 16777216, offset_within_address_space = 3221225472, readonly = false, nonvolatile = false}
        listener = 0x556a8f8d1d98
        as = <optimized out>
        view = 0x7fe92824afb0
        fr = 0x7fe9282d9950
        sec_start = <optimized out>
        sec_end = <optimized out>
#7  0x0000556a8e13b690 in migration_bitmap_clear_dirty (page=358, rb=<optimized out>, rs=0x7fe74c001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:1741
        shift = <optimized out>
        size = <optimized out>
        start = <optimized out>
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "migration_bitmap_clear_dirty"
        tmppages = <optimized out>
        pages = 0
        pagesize_bits = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        pss = {block = <optimized out>, page = 358, complete_round = <optimized out>}
        pages = 0
        again = true
        found = <optimized out>
#8  0x0000556a8e13b690 in ram_save_host_page
    (last_stage=<optimized out>, pss=<synthetic pointer>, rs=0x7fe74c001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:2605
        tmppages = <optimized out>
        pages = 0
        pagesize_bits = <optimized out>
        pss = {block = <optimized out>, page = 358, complete_round = <optimized out>}
        pages = 0
        again = true
        found = <optimized out>
#9  0x0000556a8e13b690 in ram_find_and_save_block
    (rs=rs@entry=0x7fe74c001c20, last_stage=last_stage@entry=false)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:2673
        pss = {block = <optimized out>, page = 358, complete_round = <optimized out>}
        pages = 0
        again = true
        found = <optimized out>
#10 0x0000556a8e13c1aa in ram_find_and_save_block (last_stage=false, rs=0x7fe74c001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:3532
        pages = 0
        pages = <optimized out>
        temp = <optimized out>
        rs = 0x7fe74c001c20
--Type <RET> for more, q to quit, c to continue without paging--
        ret = <optimized out>
        i = 669
        t0 = 88372047248604
        done = 0
#11 0x0000556a8e13c1aa in ram_save_iterate (f=0x556a8f876130, opaque=<optimized out>)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:3540
        pages = <optimized out>
        temp = <optimized out>
        rs = 0x7fe74c001c20
        ret = <optimized out>
        i = 669
        t0 = 88372047248604
        done = 0
#12 0x0000556a8e30c5ff in qemu_savevm_state_iterate (f=0x556a8f876130, postcopy=false)
    at migration/savevm.c:1185
        se = 0x556a8f8e9f80
        ret = 1
#13 0x0000556a8e30868a in migration_thread (opaque=0x556a8f846400) at migration/migration.c:3121
        s = 0x556a8f846400
        setup_start = <optimized out>
        thr_error = <optimized out>
        urgent = <optimized out>
#14 0x0000556a8e4478d4 in qemu_thread_start (args=0x556a8fe047e0) at util/qemu-thread-posix.c:502
        __clframe = 
          {__cancel_routine = <optimized out>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <optimized out>}
        qemu_thread_args = 0x556a8fe047e0
--Type <RET> for more, q to quit, c to continue without paging--
        start_routine = 0x556a8e3084d0 <migration_thread>
        arg = 0x556a8f846400
        r = <optimized out>
#15 0x00007fe95ae1b2de in start_thread () at /lib64/libpthread.so.0
#16 0x00007fe95ab4be53 in clone () at /lib64/libc.so.6

Expected results:


Additional info:
Send migration command during booting guest.

Comment 1 Li Xiaohui 2019-11-15 09:51:47 UTC
reproduce this bz on qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4.x86_64, but not reproduce on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64.

Comment 2 Li Xiaohui 2019-11-15 11:07:52 UTC
Hi all,
try this again on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64, but found another issue:
(gdb) t a a bt full

Thread 40 (Thread 0x7f3379ffb700 (LWP 23432)):
#0  0x00007f35c405d4a7 in pread64 () at /lib64/libpthread.so.0
#1  0x00005649b979c45d in handle_aiocb_rw_linear ()
#2  0x00005649b979d2dc in handle_aiocb_rw ()
#3  0x00005649b9825bdc in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 39 (Thread 0x7f3399ffb700 (LWP 23406)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 38 (Thread 0x7f335affd700 (LWP 23437)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 37 (Thread 0x7f35b69ff700 (LWP 23083)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 36 (Thread 0x7f33baffd700 (LWP 23362)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

--Type <RET> for more, q to quit, c to continue without paging--
Thread 35 (Thread 0x7f339b7fe700 (LWP 23395)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 34 (Thread 0x7f335b7fe700 (LWP 23436)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 33 (Thread 0x7f33ba7fc700 (LWP 23363)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 32 (Thread 0x7f335a7fc700 (LWP 23438)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 31 (Thread 0x7f33bb7fe700 (LWP 23361)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 30 (Thread 0x7f337a7fc700 (LWP 23428)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 29 (Thread 0x7f337bfff700 (LWP 23425)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 28 (Thread 0x7f33997fa700 (LWP 23407)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 27 (Thread 0x7f337affd700 (LWP 23427)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 26 (Thread 0x7f33b9ffb700 (LWP 23391)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 25 (Thread 0x7f335bfff700 (LWP 23435)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 24 (Thread 0x7f33797fa700 (LWP 23433)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 23 (Thread 0x7f339bfff700 (LWP 23394)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 22 (Thread 0x7f3398ff9700 (LWP 23424)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 21 (Thread 0x7f339affd700 (LWP 23396)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 20 (Thread 0x7f33b97fa700 (LWP 23392)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 19 (Thread 0x7f3359ffb700 (LWP 23439)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 18 (Thread 0x7f33b8ff9700 (LWP 23393)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--

Thread 17 (Thread 0x7f3378ff9700 (LWP 23434)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 16 (Thread 0x7f35c96cdf00 (LWP 23054)):
#0  0x00007f35c3d79026 in ppoll () at /lib64/libc.so.6
#1  0x00005649b9826505 in qemu_poll_ns ()
#2  0x00005649b9827405 in main_loop_wait ()
#3  0x00005649b9610419 in main_loop ()
#4  0x00005649b94be993 in main ()

Thread 15 (Thread 0x7f35b5dff700 (LWP 23084)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 14 (Thread 0x7f339a7fc700 (LWP 23405)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 13 (Thread 0x7f35b7c3d700 (LWP 23080)):
#0  0x00007f35c3d78f31 in poll () at /lib64/libc.so.6
#1  0x00007f35c8b3d9b6 in g_main_context_iterate.isra ()
    at /lib64/libglib-2.0.so.0
#2  0x00007f35c8b3dd72 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00005649b960ade1 in iothread_run ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 12 (Thread 0x7f359bfff700 (LWP 23086)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 11 (Thread 0x7f337b7fe700 (LWP 23426)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 10 (Thread 0x7f35bd296700 (LWP 23055)):
#0  0x00007f35c3d7e6bd in syscall () at /lib64/libc.so.6
#1  0x00005649b982ae0f in qemu_event_wait ()
#2  0x00005649b983ca02 in call_rcu_thread ()
#3  0x00005649b982a5e4 in qemu_thread_start ()
#4  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 9 (Thread 0x7f35b51ff700 (LWP 23085)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 8 (Thread 0x7f358adff700 (LWP 23091)):
#0  0x00007f35c405947c in pthread_cond_wait@@GLIBC_2.3.2 ()
    at /lib64/libpthread.so.0
#1  0x00005649b982a99d in qemu_cond_wait_impl ()
#2  0x00005649b9753fd1 in vnc_worker_thread_loop ()
#3  0x00005649b9754590 in vnc_worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f33bbfff700 (LWP 23360)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f3599fff700 (LWP 23089)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f359abff700 (LWP 23088)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f359b7fe700 (LWP 23087)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f35bca95700 (LWP 23356)):
#0  0x00007f35c405c072 in do_futex_wait () at /lib64/libpthread.so.0
#1  0x00007f35c405c183 in __new_sem_wait_slow () at /lib64/libpthread.so.0
#2  0x00005649b982abaf in qemu_sem_timedwait ()
#3  0x00005649b9825b54 in worker_thread ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f35b743c700 (LWP 23081)):
#0  0x00007f35c3d7a84b in ioctl () at /lib64/libc.so.6
#1  0x00005649b9524cb9 in kvm_vcpu_ioctl ()
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00005649b9524d79 in kvm_cpu_exec ()
#3  0x00005649b9509fbe in qemu_kvm_cpu_thread_fn ()
#4  0x00005649b982a5e4 in qemu_thread_start ()
#5  0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#6  0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f3358ff9700 (LWP 24733)):
#0  0x00007f35c3cbf82f in raise () at /lib64/libc.so.6
#1  0x00007f35c3ca9c45 in abort () at /lib64/libc.so.6
#2  0x00007f35c3ca9b19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f35c3cb7de6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x00005649b9522948 in  ()
#5  0x00005649b9517802 in memory_region_clear_dirty_bitmap ()
#6  0x00005649b951e570 in ram_find_and_save_block.part ()
#7  0x00005649b951f08a in ram_save_iterate ()
#8  0x00005649b96ef34f in qemu_savevm_state_iterate ()
#9  0x00005649b96eb3da in migration_thread ()
#10 0x00005649b982a5e4 in qemu_thread_start ()
#11 0x00007f35c40532de in start_thread () at /lib64/libpthread.so.0
#12 0x00007f35c3d83e53 in clone () at /lib64/libc.so.6

Comment 3 Li Xiaohui 2019-11-15 14:42:14 UTC
on qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64, try this bz, found some conclusions:
1.not always reproduce bz, but often reproduce; and till now, only reproduce bz when migrate guest during the early stage of boot 
2.reproduce this bz not only on q35+ovmf, but also q35+seabios 

Will find more useful infomation tmr.
1.whether can reproduce bz in that stage: migrate after guest has finished loading
2.confirm whether can reproduce bz on qemu-kvm-4.1.0-13.module+el8.1.0+4313, what's more, Which version of bz was introduced from?

Comment 4 Li Xiaohui 2019-11-18 00:51:19 UTC
Got conclusions after some tests:
1.reproduce bz on qemu-kvm-4.1.0-1.module+el8.1.0+3966&qemu-kvm-4.1.0-13.module+el8.1.0+4313&qemu-kvm-4.1.0-14.module+el8.1.0+4548, but didn't reproduce on qemu-kvm-4.0.0-6.module+el8.1.0+3736
2.on qemu-kvm-4.1.0-1.module+el8.1.0+3966, didn't reproduce bz with rhel8.1 guest(q35+seabios) after tried 30 times, but can easily hit this issue with win2019 guest (q35+seabios or q35+ovmf)
3. can reproduce this bz easily with win2019 guest (q35+seabios or q35+ovmf), the recurrence rate is greater than 90% when run migrate.with_reboot.tcp script by auto in avocado.

migrate.with_reboot.tcp: do local migration(src=dst host)
(1)boot a win2019 guest with "-S" on src host
(2)after guest started, reboot guest
(3)as once as guest start to reboot, start a guest with "-incoming tcp:0:1234 -S" on dst host, and execute to migrate guest from src to dst at once via "migrate -d tcp:localhost:1234".
(4)during the early stage of migration, qemu on src host will hit core dump.

Comment 5 Li Xiaohui 2019-11-18 02:17:43 UTC
Manual test on different hosts: 
boot a win2019 guest on src host -> boot a guest with "-icoming ..." on dst host -> after guest started, restart the guest on src host

when windows guest show it's in restarting status, can reproduce bz via doing migration, like picture 1;
after restarting, do migration, couldn't reproduce bz, like picture 2.

Comment 6 Li Xiaohui 2019-11-18 02:19:54 UTC
Created attachment 1637166 [details]
1-qemu-core-dump-when-guest-restarting

Comment 7 Li Xiaohui 2019-11-18 02:21:10 UTC
Created attachment 1637167 [details]
2-migration-finish-after-guest-restarting

Comment 8 Dr. David Alan Gilbert 2019-11-20 13:56:04 UTC
Hi,
  Looking at comment 4, can you try a rhel 8 guest using qemu-kvm 4.1.0-14 please.

Also, please don't use screenshots of text - just copy and paste the text.

Comment 9 Li Xiaohui 2019-11-21 06:39:27 UTC
(In reply to Dr. David Alan Gilbert from comment #8)
> Hi,
>   Looking at comment 4, can you try a rhel 8 guest using qemu-kvm 4.1.0-14
> please.
> 
> Also, please don't use screenshots of text - just copy and paste the text.

Hi,
Reproduce this bz with rhel8.1 guest(q35+seabios) on qemu-img-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64, but tried 30 times by auto, only hit this issue once(1/30). The core dump log is same with Comment 0

Comment 10 Dr. David Alan Gilbert 2019-11-21 10:41:03 UTC
I had a look at one of the core files, it's  bit confusing, but the migration side seems to be normal RAM:

#0  0x00007f58dd9f882f in raise () at /lib64/libc.so.6
#1  0x00007f58dd9e2c45 in abort () at /lib64/libc.so.6
#2  0x00007f58dd9e2b19 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3  0x00007f58dd9f0de6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4  0x000056407c711a68 in kvm_physical_log_clear (section=0x7f5680ad74f0, section=0x7f5680ad74f0, kml=0x56407daf5508)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/accel/kvm/kvm-all.c:673
 
(gdb) p/x *mem
$3 = {start_addr = 0x100000, memory_size = 0x7ff00000, ram = 0x7f579ff00000, slot = 0x9, flags = 0x1, old_flags = 0x1, dirty_bmap = 0x0}
 
#5  0x000056407c711a68 in kvm_log_clear (listener=0x56407daf5508, section=0x7f5680ad74f0)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/accel/kvm/kvm-all.c:1064
 
(gdb) p *section->mr
$5 = {parent_obj = {class = 0x56407da85330, free = 0x0, Python Exception <class 'gdb.error'> There is no member named keys.:
properties = 0x56407dc0e580, ref = 1, parent = 0x56407dac1d10}, romd_mode = true, ram = true,
  subpage = false, readonly = false, nonvolatile = false, rom_device = false, flush_coalesced_mmio = false, global_locking = true,
  dirty_log_mask = 0 '\000', is_iommu = false, ram_block = 0x56407dc57de0, owner = 0x0, ops = 0x56407cfe10e0 <unassigned_mem_ops>, opaque = 0x0,
  container = 0x0, size = 4294967296, addr = 0, destructor = 0x56407c701a30 <memory_region_destructor_ram>, align = 2097152, terminates = true,
  ram_device = false, enabled = true, warning_printed = false, vga_logging_count = 0 '\000', alias = 0x0, alias_offset = 0, priority = 0, subregions = {
    tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x56407dc579f8}}, subregions_link = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0,
      tql_prev = 0x0}}, coalesced = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x56407dc57a18}}, name = 0x56407dc57c70 "pc.ram",
  ioeventfd_nb = 0, ioeventfds = 0x0}
 
#6  0x000056407c706922 in memory_region_clear_dirty_bitmap (mr=0x56407dc57950, start=1073741824, len=1073741824)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/memory.c:2124
 
(gdb) p/x mrs
$8 = {mr = 0x56407dc57950, fv = 0x7f58c02b5aa0, offset_within_region = 0x40000000, size = 0x40000000, offset_within_address_space = 0x40000000,
  readonly = 0x0, nonvolatile = 0x0}
 
#7  0x000056407c70d690 in migration_bitmap_clear_dirty (page=262144, rb=<optimized out>, rs=0x7f5678001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:1741
#8  0x000056407c70d690 in ram_save_host_page (last_stage=<optimized out>, pss=<synthetic pointer>, rs=0x7f5678001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:2605
#9  0x000056407c70d690 in ram_find_and_save_block (rs=rs@entry=0x7f5678001c20, last_stage=last_stage@entry=false)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:2673
#10 0x000056407c70e1aa in ram_find_and_save_block (last_stage=false, rs=0x7f5678001c20)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:3532
#11 0x000056407c70e1aa in ram_save_iterate (f=0x56407dc58540, opaque=<optimized out>)
    at /usr/src/debug/qemu-kvm-4.1.0-14.module+el8.2.0+4673+ff4b3b61.x86_64/migration/ram.c:3540
#12 0x000056407c8de5ff in qemu_savevm_state_iterate (f=0x56407dc58540, postcopy=false) at migration/savevm.c:1185
#13 0x000056407c8da68a in migration_thread (opaque=0x56407daf5620) at migration/migration.c:3121
#14 0x000056407ca198d4 in qemu_thread_start (args=0x56407dc6b130) at util/qemu-thread-posix.c:502
#15 0x00007f58ddd8c2de in start_thread () at /lib64/libpthread.so.0
#16 0x00007f58ddabce53 in clone () at /lib64/libc.so.6

so 'pc.ram' apparently and a 1GB chunk at 1GB offset - I was expecting a more unusual RAM Block to be the problem.
Oddly I don't see the main thread in the coredump; but there's no apparent reason it would have quit:

(gdb) p reset_requested 
$4 = SHUTDOWN_CAUSE_NONE
(gdb) p shutdown_requested 
$5 = SHUTDOWN_CAUSE_NONE
(gdb) p powerdown_requested 
$6 = 0
(gdb) p shutdown_signal
$7 = 0

this is a 'with reboot' avocado test:

Comment 11 Dr. David Alan Gilbert 2019-11-21 12:28:45 UTC
OK, I think I understand what's going on.

During a reboot it's actually normal to remove/regenerate a lot of slots as we poke at things like PAM registers (from mch_update_pam)
and smram (mch_update_smram) that casues calls to kvm_set_phys_mem.  When it recreates a slot, it does a sync on the old slot and then frees the dirty_bmap
- so if we've touched a pam register or the like between the point we do the sync and the point we come to do the clear we'll probably hit this.


I think we can probably just drop the assert in kvm_log_clear_one_slot and make it exit; since
kvm_set_phys_mem does a sync before freeing the old bmap I think it will be safe.

Comment 12 Dr. David Alan Gilbert 2019-11-21 13:58:25 UTC
I can trigger this under a running Linux guest by prodding at the q35 smram registers constantly during migration.

Comment 13 Dr. David Alan Gilbert 2019-11-21 16:54:36 UTC
Please test to see if this fixes it:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372

Comment 14 Dr. David Alan Gilbert 2019-11-21 16:58:12 UTC
Posted upstream:
kvm: Reallocate dirty_bmap when we change a slot

Comment 15 Li Xiaohui 2019-11-22 08:45:09 UTC
(In reply to Dr. David Alan Gilbert from comment #13)
> Please test to see if this fixes it:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24874372

Use the qemu build to run auto test on the same host from Comment 9.
Now only get the win guest result: tried 30 times by auto, no qemu core dump, all passed.

Still is running auto with rhel8.1 guest for 100 times, will update the result here after tests.

Comment 16 Li Xiaohui 2019-11-25 01:34:07 UTC
use the build and run migrate.with_reboot.tcp with 100 times for each win2019(q35+ovmf) guest & rhel8.1(q35+seabios) guest, all passed.

Comment 17 Dr. David Alan Gilbert 2019-11-25 10:54:56 UTC
(In reply to Li Xiaohui from comment #16)
> use the build and run migrate.with_reboot.tcp with 100 times for each
> win2019(q35+ovmf) guest & rhel8.1(q35+seabios) guest, all passed.

OK thanks for the test; I'll wait for that patch to go upstream and then backport.

Comment 19 Dr. David Alan Gilbert 2020-01-02 12:08:27 UTC
*** Bug 1771032 has been marked as a duplicate of this bug. ***

Comment 21 Li Xiaohui 2020-01-06 11:14:04 UTC
reproduce this issue on rhel8.2.0-av when run auto case rhel7_10052_win, wait for the fixed version. Thanks

*************************
[root@hp-dl385g10-14 ipa]# python3 Start2Run.py --test_requirement=rhel7_47114_win_blockdev --test_case=rhel7_10052_win --src_host_ip=10.73.130.69 --dst_host_ip=10.73.130.67 --share_images_dir=/mnt/nfs --sys_image_name=win2019-64-virtio-scsi-2.qcow2 --cpu_model=EPYC
*************************

Comment 24 Li Xiaohui 2020-01-13 01:57:45 UTC
verify this bz on hosts(kernel-4.18.0-167.el8.x86_64&qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739.x86_64):
1.run migrate.with_reboot.tcp with 100 times for win2019(q35+ovmf) guest, Pass
the auto result please see this link: http://10.66.86.2/kvm_autotest_job_log/?jobid=4006275
2.run migrate.with_reboot.tcp with 100 times for rhel8.1(q35+seabios) guest, Pass
one time failed, but this fail isn't related with this bz, so can verify this issue.
the auto result please see this link: http://10.66.86.2/kvm_autotest_job_log/?jobid=4005795
3.run rhel7_10052_win with 100 times for win2019(pc+seabios) guest, Pass
[root@hp-dl385g10-14 latest]# grep -i "PASS: RHEL7-10052-WIN" rhel7_10052_win-2020-01-10-0*/short_debug.log* | wc -l
100

From above test results, make this bz verified

Comment 25 Ademar Reis 2020-02-05 23:08:21 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 27 errata-xmlrpc 2020-05-05 09:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.