Bug 1794692 - Mirror block job stops making progress
Summary: Mirror block job stops making progress
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks: 1771318
TreeView+ depends on / blocked
 
Reported: 2020-01-24 11:03 UTC by Kevin Wolf
Modified: 2020-05-05 09:56 UTC (History)
17 users (show)

Fixed In Version: qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1655367
Environment:
Last Closed: 2020-05-05 09:55:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:56:49 UTC

Description Kevin Wolf 2020-01-24 11:03:05 UTC
Two different bugs were mixed in a single bug. Splitting out the part that belongs to a mirror job making no progress.

+++ This bug was initially created as a clone of Bug #1655367 +++

[...]

--- Additional comment from aihua liang on 2019-01-11 10:13:38 CET ---

Qemu also hang when set buf-size when doing block mirror.

 kernel version: 4.18.0-57.el8.x86_64
 qemu-kvm version:qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64

Test Steps:
  1. Start guest with cmds:
       /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190107-223041-G541jCra,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190107-223041-G541jCra,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id0S5usU  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190107-223041-G541jCra,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190107-223041-G541jCra,path=/var/tmp/seabios-20190107-223041-G541jCra,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190107-223041-G541jCra,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device virtio-net-pci,mac=9a:a0:a1:a2:a3:a4,id=idawKNIF,vectors=4,netdev=idIaOUAY,bus=pci.0,addr=0x5  \
    -netdev tap,id=idIaOUAY,vhost=on \
    -m 7168  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu host,+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -object secret,id=sec0,data=backing \
    -blockdev node-name=my_drive,driver=iscsi,cache.direct=off,cache.no-flush=on,transport=tcp,portal=10.73.194.81,target=iqn.2019-01.com.example:aliang,lun=1 \
    -blockdev driver=luks,node-name=drive_image1,file=my_drive,key-secret=sec0 \
    -device ide-hd,drive=drive_image1,id=image1,bus=ide.0,unit=0 \
    -drive id=drive_winutils,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
    -device ide-cd,id=winutils,drive=drive_winutils,bus=ide.0,unit=1 \
    -drive id=drive_unattended,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/virtio-win.iso.el8 \
    -device ide-cd,id=unattended,drive=drive_unattended,bus=ide.1,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=d \
    -enable-kvm \
    -monitor stdio \

2. Create mirror image in previous:
    {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn1','size':32212254720},'job-id':'job1'}}
    {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
    {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':32212254720},'job-id':'job2'}}
    {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1'}}
    {'execute':'job-dismiss','arguments':{'id':'job1'}}
    {'execute':'job-dismiss','arguments':{'id':'job2'}}

3. Do block mirror with buf-size setting:
    {'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}

4. Check block job status during mirroring.
    {"execute":"query-block-jobs"}
    {"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 32212975616, "offset": 4499046400, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

    {"execute":"query-block-jobs"}
    {"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 32282705920, "offset": 13958512640, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

    {"execute":"query-block-jobs"}
    {"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 32287846400, "offset": 32211759104, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

    {"execute":"query-block-jobs"}
    {"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 32287846400, "offset": 32211759104, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

    {"execute":"query-block-jobs"}
    {"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 32287846400, "offset": 32211759104, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

5. Check gdb info
    #gdb -p 17388
    (gdb) bt full
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb623e6 in kvm_vm_ioctl (s=s@entry=0x5595af85bed0, type=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2138
        ret = <optimized out>
        arg = 0x7fff0040c650
        ap = {{gp_offset = 16, fp_offset = 21909, overflow_arg_area = 0x7fff0040c650, reg_save_area = 0x7fff0040c610}}
#2  0x00005595aeb63109 in kvm_set_irq (s=0x5595af85bed0, irq=irq@entry=8, level=level@entry=1)
    at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:958
        event = {{irq = 1, status = 1}, level = 1}
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "kvm_set_irq"
#3  0x00005595aebc8937 in kvm_pic_set_irq (opaque=<optimized out>, irq=8, level=1) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/hw/i386/kvm/i8259.c:115
        delivered = <optimized out>
#4  0x00005595aeb97806 in qemu_irq_raise (irq=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/include/hw/irq.h:16
#5  0x00005595aeb97806 in rtc_policy_slew_deliver_irq (s=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/hw/timer/mc146818rtc.c:149
#6  0x00005595aeb978d2 in rtc_periodic_timer (opaque=0x5595af840450) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/hw/timer/mc146818rtc.c:279
        s = 0x5595af840450
#7  0x00005595aee44027 in timerlist_run_timers (timer_list=0x5595af84ca10) at util/qemu-timer.c:574
        ts = <optimized out>
        current_time = 1547196543718171000
        progress = <optimized out>
        cb = 0x5595aeb97870 <rtc_periodic_timer>
        opaque = <optimized out>
        need_replay_checkpoint = <optimized out>
        timer_list = 0x5595af84ca10
#8  0x00005595aee442ae in qemu_clock_run_timers (type=<optimized out>) at util/qemu-timer.c:708
        progress = false
        type = <optimized out>
#9  0x00005595aee442ae in qemu_clock_run_all_timers () at util/qemu-timer.c:708
        progress = false
        type = <optimized out>
#10 0x00005595aee447a5 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:503
        timeout = 4294967295
        timeout_ns = <optimized out>
#11 0x00005595aec37e99 in main_loop () at vl.c:1910
#12 0x00005595aeaf83f4 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4681
        i = <optimized out>
        snapshot = 0
        linux_boot = <optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = 0x5595af7e0b80 "cdn"
        boot_once = <optimized out>
        ds = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        accel_opts = 0x0
        olist = <optimized out>
        optind = 67
        optarg = 0x7fff0040f566 "stdio"
        loadvm = 0x0
        machine_class = <optimized out>
        cpu_model = 0x7fff0040f196 "host,+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time"
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        incoming = 0x0
        userconfig = <optimized out>
        nographic = false
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = 7516192768
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        dir = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7fff0040c880}
        __func__ = "main"

6. Check pstack info
   # pstack 17388
Thread 11 (Thread 0x7f054bfff700 (LWP 17453)):
#0  0x00007f057ee7c48c in futex_wait_cancelable (private=0, expected=0, futex_word=0x5595b07e3798) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x00007f057ee7c48c in __pthread_cond_wait_common (abstime=0x0, mutex=0x5595b07e37a8, cond=0x5595b07e3770) at pthread_cond_wait.c:502
#2  0x00007f057ee7c48c in __pthread_cond_wait (cond=0x5595b07e3770, mutex=mutex@entry=0x5595b07e37a8) at pthread_cond_wait.c:655
#3  0x00005595aee4791d in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x5595b07e37a8, file=0x5595aefab4c7 "ui/vnc-jobs.c", line=214) at util/qemu-thread-posix.c:161
#4  0x00005595aed7b691 in vnc_worker_thread_loop (queue=queue@entry=0x5595b07e3770) at ui/vnc-jobs.c:214
#5  0x00005595aed7bc50 in vnc_worker_thread (arg=0x5595b07e3770) at ui/vnc-jobs.c:324
#6  0x00005595aee47564 in qemu_thread_start (args=0x5595b0826ae0) at util/qemu-thread-posix.c:498
#7  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#8  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 10 (Thread 0x7f055e7fc700 (LWP 17451)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af9ca170, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af9ca170) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af9ec4d0) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 9 (Thread 0x7f055effd700 (LWP 17450)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af9a6f80, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af9a6f80) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af9c92e0) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 8 (Thread 0x7f055f7fe700 (LWP 17449)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af9843b0, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af9843b0) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af9a6710) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 7 (Thread 0x7f055ffff700 (LWP 17448)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af9617e0, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af9617e0) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af983b40) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 6 (Thread 0x7f0574bff700 (LWP 17447)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af93e9e0, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af93e9e0) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af960fa0) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 5 (Thread 0x7f0575400700 (LWP 17446)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af91bdc0, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af91bdc0) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af93e170) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 4 (Thread 0x7f0575c01700 (LWP 17445)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af8f7690, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af8f7690) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af91b550) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 3 (Thread 0x7f0576402700 (LWP 17444)):
#0  0x00007f057eb9d45b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005595aeb64749 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5595af8ab770, type=type@entry=44672) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:2156
#2  0x00005595aeb6480a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/accel/kvm/kvm-all.c:1993
#3  0x00005595aeb3e066 in qemu_kvm_cpu_thread_fn (arg=0x5595af8ab770) at /usr/src/debug/qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64/cpus.c:1281
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af8cd0a0) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 2 (Thread 0x7f0577b40700 (LWP 17389)):
#0  0x00007f057eba12bd in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005595aee47d8f in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at util/qemu-thread-posix.c:438
#2  0x00005595aee47d8f in qemu_event_wait (ev=ev@entry=0x5595af6ab8e8 <rcu_call_ready_event>) at util/qemu-thread-posix.c:442
#3  0x00005595aee594c2 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261
#4  0x00005595aee47564 in qemu_thread_start (args=0x5595af793260) at util/qemu-thread-posix.c:498
#5  0x00007f057ee762de in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f057eba6a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7f0583a8cec0 (LWP 17388)):
#0  0x00007f057eb9b8d6 in __GI_ppoll (fds=0x5595af7b24f0, nfds=12, timeout=<optimized out>, timeout@entry=0x7fff0040c6f0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x00005595aee43765 in ppoll (__ss=0x0, __timeout=0x7fff0040c6f0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x00005595aee43765 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=455741) at util/qemu-timer.c:334
#3  0x00005595aee44735 in os_host_main_loop_wait (timeout=455741) at util/main-loop.c:233
#4  0x00005595aee44735 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5  0x00005595aec37e99 in main_loop () at vl.c:1910
#6  0x00005595aeaf83f4 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4681

7. Check block job status:
   {"execute":"query-block-jobs"}
    ---> no response

Qemu and guest hang...

--- Additional comment from aihua liang on 2019-03-28 09:57:49 CET ---

When running testcase "block_stream_negative" with dataplane enable on qemu-kvm-3.1.0-20.module+el8+2904+e658c755.x86_64, also hit this issue.

{'execute': 'block-stream', 'arguments': {'device': 'drive_image1', 'speed': 10485760}, 'id': 'bYJuNEbM'}

block_mirror no progress with info:
 {'execute': 'query-block-jobs', 'id': 'n1wS5W7l'}
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Response to 'query-block-jobs' (re-formated)
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        auto-finalize: True
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        io-status: ok
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        device: drive_image1
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        auto-dismiss: True
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        busy: True
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        len: 21474836480
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        offset: 3927113728
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        status: running
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        paused: False
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        speed: 10485760
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        ready: False
05:49:35 DEBUG| (monitor avocado-vt-vm1.qmpmonitor1)        type: stream

Will try to reproduce it by manual later, if possible, will give the detail info.

--- Additional comment from aihua liang on 2019-10-14 11:03:31 CEST ---

Also hit this issue in qemu-kvm-rhev-2.12.0-37.el7.x86_64.

--- Additional comment from Ademar Reis on 2019-11-20 17:35:29 CET ---

(In reply to aihua liang from comment #9)
> [...]

Kevin: does it ring a bell? Can you please try reproducing this? I see that both -blockdev and -drive are being used, perhaps we should find a simpler reproducer?

--- Additional comment from John Snow on 2020-01-23 23:54:16 CET ---

Hi, doing triage on my bugs again: aliang, we can't reproduce this "manually", only with avocado still, right? I'm less familiar with Avocado and need to consult with e.g. Cleber Rosa to see if I can reproduce it. This looks like a pretty straightforward mirror operation and I don't have any ready-made hunches or suspects right now.

Looks like it's not a recent regression; if it happens on 2.12 and up through 3.1, do we know if it happens on 4.2?

Next, it looks like the job is choking at 99%; possibly either:

(1) The query command is racing with job cleanup, which creates complications, or
(2) The job is failing all on its own.

Something isn't quiescing/cleaning up correctly at the end. I wonder if the query command is truly necessary to replicate the hang? Could the query be an important piece?


So, aliang:

- Is the only way to reproduce this inside avocado for now?

--- Additional comment from Kevin Wolf on 2020-01-24 10:44:38 CET ---

(In reply to Ademar Reis from comment #12)
> Kevin: does it ring a bell? Can you please try reproducing this? I see that
> both -blockdev and -drive are being used, perhaps we should find a simpler
> reproducer?

First of all, this should be a separate BZ because it has an entirely different backtrace than the original report. Not every case where QEMU hangs is the same bug.

The pattern of the originally reported bug:
[...]

The pattern of comment 9:
Nobody is trying to get a lock or anything. Everything is just sitting there idle. The problem in step 4 seems to be that the block job doesn't move forward any more immediately before it should finish. Getting a coroutine backtrace of the job would be helpful there. Step 7 says that at some point, the monitor doesn't respond any more. Unfortunately the backtrace seems to be from before the hang because a monitor hang should be visible there.

If I understand right, the original bug hasn't been reproduced outside of Avocado, but the unrelated bug from comment 9 looks more like manual testing.

Comment 1 Kevin Wolf 2020-01-24 11:13:26 UTC
So I can reproduce a mirror hang, even in upstream, with a much simpler setup (just a local qcow2 file on tmpfs, no iscsi or luks involved).

I'm not entirely sure if it's exactly the same problem, but the symptom is that query-block-jobs doesn't increase 'offset' or 'len' any more even though the guest continues to write. Trying to exit QEMU with Ctrl-C makes the whole process hang because it tries to cancel the job synchronously, but the job just won't move.

The job coroutine at that point looks like this:

#0  0x0000561c1624e530 in qemu_coroutine_switch (from_=from_@entry=0x561c17454100, to_=to_@entry=0x7f2343609538, action=action@entry=COROUTINE_YIELD)
    at util/coroutine-ucontext.c:218
#1  0x0000561c1624d251 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2  0x0000561c1624d8b5 in qemu_co_queue_wait_impl (queue=0x561c1745f990, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
#3  0x0000561c16193bb4 in mirror_wait_for_any_operation (active=false, s=0x561c173f5550) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
#4  0x0000561c16193bb4 in mirror_wait_for_free_in_flight_slot (s=0x561c173f5550) at block/mirror.c:307
#5  0x0000561c16193bb4 in mirror_run (job=0x561c173f5550, errp=<optimized out>) at block/mirror.c:974
#6  0x0000561c1614432f in job_co_entry (opaque=0x561c173f5550) at job.c:878
#7  0x0000561c1624e5ab in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#8  0x00007f234da19250 in __start_context () at /lib64/libc.so.6

Some mirroring operation is in flight and doesn't complete. Will have to continue debuggging to find out which operation and where it is hanging.

Comment 3 Kevin Wolf 2020-01-28 14:45:15 UTC
Turns out there is a mirror operation that actually waits for itself to complete:

(gdb) qemu coroutine 0x562a17afc3b0
#0  0x0000562a1656cde0 in qemu_coroutine_switch (from_=from_@entry=0x562a17afc3b0, to_=to_@entry=0x7f7891344538, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1  0x0000562a1656bb01 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2  0x0000562a1656c165 in qemu_co_queue_wait_impl (queue=0x562a1868aed0, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
#3  0x0000562a164b168c in mirror_wait_for_any_operation (active=false, s=0x562a18a72e00) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
#4  0x0000562a164b168c in mirror_wait_for_free_in_flight_slot (s=0x562a18a72e00) at block/mirror.c:308
#5  0x0000562a164b168c in mirror_co_read (opaque=0x562a1868ae80) at block/mirror.c:351
#6  0x0000562a1656ce5b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#7  0x00007f789b754250 in __start_context () at /lib64/libc.so.6
#8  0x00007ffedcbebdf0 in  ()
#9  0x0000000000000000 in  ()
(gdb) macro define offsetof(_type, _memb) \
      ((long)(&((_type *)0)->_memb))
(gdb) macro define container_of(_ptr, _type, _memb) \
      ((_type *)((void *)(_ptr) - offsetof(_type, _memb)))
(gdb) p container_of ((CoQueue *)0x562a1868aed0, MirrorOp, waiting_requests)
$12 = (MirrorOp *) 0x562a1868ae80
(gdb) p *$12
$13 = {s = 0x562a18a72e00, qiov = {iov = 0x0, niov = 0, {{nalloc = 0, local_iov = {iov_base = 0x0, iov_len = 0}}, {__pad = '\000' <repeats 11 times>, size = 0}}}, offset = 32768000, 
  bytes = 655360, bytes_handled = 0x7f785877ee80, is_pseudo_op = false, is_active_write = false, waiting_requests = {entries = {sqh_first = 0x562a17afc3b0, sqh_last = 0x562a18e1e638}}, 
  co = 0x562a17afc3b0, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x562a18a73070}}}

The last print shows that the MirrorOp that owns the CoQueue we're waiting for is running in the same coroutine that is waiting, so it's waiting for itself.

Making sure that mirror_wait_for_any_operation() picks a different operation than its caller to wait for seems to fix the problem.

Comment 4 aihua liang 2020-02-03 10:00:48 UTC
Yes, Kevin, the bug was found by auto at first, but it can be reproduced easily by manual.

Test on qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904 with backend:localfs, do block mirror with buf-size setting, the job still hang.

Reproduce steps:
 1. Start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 14336  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create mirror target.
      {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
      {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
      {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
      {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
      {'execute':'job-dismiss','arguments':{'id':'job1'}}
      {'execute':'job-dismiss','arguments':{'id':'job2'}}

3. Do block mirror with buf-size setting
      {'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}

4. Check block job status during mirroring
      {"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

5. Wait for some minutes, check block job status.
{"execute":"query-block-jobs"}

After step5, qemu hang.

(gdb) bt full
#0  0x00007fdef4b71016 in __GI_ppoll (fds=0x56417ed39b40, nfds=11, timeout=<optimized out>, timeout@entry=0x7ffd93b0b600, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
        resultvar = 18446744073709551102
        sc_cancel_oldtype = 0
        tval = {tv_sec = 1, tv_nsec = 84896464}
#1  0x000056417c9fd2a5 in ppoll (__ss=0x0, __timeout=0x7ffd93b0b600, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
        ts = {tv_sec = 2, tv_nsec = 46228379}
        tvsec = <optimized out>
#2  0x000056417c9fd2a5 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=2046228379) at util/qemu-timer.c:348
        ts = {tv_sec = 2, tv_nsec = 46228379}
        tvsec = <optimized out>
#3  0x000056417c9fe145 in os_host_main_loop_wait (timeout=2046228379) at util/main-loop.c:237
        context = 0x56417edfce40
        ret = <optimized out>
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x56417edfbe00}
        ret = <optimized out>
        timeout_ns = <optimized out>
#4  0x000056417c9fe145 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x56417edfbe00}
        ret = <optimized out>
        timeout_ns = <optimized out>
#5  0x000056417c7e0841 in main_loop () at vl.c:1828
#6  0x000056417c68cc72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504
        i = <optimized out>
        snapshot = 0
        linux_boot = <optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = 0x56417ed77650 "cdn"
        boot_once = <optimized out>
        ds = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = 0x0
        olist = <optimized out>
        optind = 75
        optarg = 0x7ffd93b0d47b "tcp:0:3000,server,nowait"
        loadvm = 0x0
--Type <RET> for more, q to quit, c to continue without paging--
        machine_class = 0x56417eda95e0
        cpu_option = 0x7ffd93b0cc03 "EPYC,+kvm_pv_unhalt"
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        incoming = 0x0
        userconfig = <optimized out>
        nographic = false
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = false
        dir = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffd93b0b7a0}
        plugin_list = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x7ffd93b0b7b0}}
        __func__ = "main"

Comment 5 aihua liang 2020-02-03 10:18:55 UTC
Correct step5 and test result in comment4:
  5. Wait for some minutes, quit vm
     (qemu) quit.

  After step5, qemu hang with gdb info:
    (gdb) bt full
#0  0x00007fb14126b016 in __GI_ppoll (fds=0x559551a9c230, nfds=1, timeout=<optimized out>, timeout@entry=0x7ffdb4c5b6e0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
        resultvar = 18446744073709551102
        sc_cancel_oldtype = 0
        tval = {tv_sec = 423, tv_nsec = 208227472}
#1  0x000055955000c2a5 in ppoll (__ss=0x0, __timeout=0x7ffdb4c5b6e0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
        ts = {tv_sec = 426, tv_nsec = 404369607}
        tvsec = <optimized out>
#2  0x000055955000c2a5 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:348
        ts = {tv_sec = 426, tv_nsec = 404369607}
        tvsec = <optimized out>
#3  0x000055955000e2a4 in aio_poll (ctx=0x559551b1f9d0, blocking=blocking@entry=true) at util/aio-posix.c:669
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = 426404369607
        start = 0
        __PRETTY_FUNCTION__ = "aio_poll"
#4  0x000055954ff89b75 in bdrv_drain_all_begin () at block/io.c:630
        waited_ = <optimized out>
        ctx_ = 0x0
        bs = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_drain_all_begin"
        __PRETTY_FUNCTION__ = "bdrv_drain_all_begin"
#5  0x000055954ff89b75 in bdrv_drain_all_begin () at block/io.c:595
        __PRETTY_FUNCTION__ = "bdrv_drain_all_begin"
#6  0x000055954fc9bc81 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4523
        i = <optimized out>
        snapshot = 0
        linux_boot = <optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = 0x559551a9b650 "cdn"
        boot_once = <optimized out>
        ds = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        accel_opts = 0x0
        olist = <optimized out>
        optind = 75
        optarg = 0x7ffdb4c5e422 "tcp:0:3000,server,nowait"
        loadvm = 0x0
        machine_class = 0x559551acd5e0
        cpu_option = 0x7ffdb4c5dbaa "EPYC,+kvm_pv_unhalt"
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        incoming = 0x0
        userconfig = <optimized out>
        nographic = false
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = false
        dir = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffdb4c5b8c0}
        plugin_list = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x7ffdb4c5b8d0}}
        __func__ = "main"

Comment 6 Ademar Reis 2020-02-05 23:13:55 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 11 aihua liang 2020-03-02 03:01:53 UTC
Test on qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae, still hit this issue.
  1. Start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 14336  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create mirror target.
      {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
      {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
      {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
      {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
      {'execute':'job-dismiss','arguments':{'id':'job1'}}
      {'execute':'job-dismiss','arguments':{'id':'job2'}}

3. Do block mirror with buf-size setting
      {'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}
      {"timestamp": {"seconds": 1583116751, "microseconds": 587039}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1583116751, "microseconds": 587152}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}

4. Check block jobs every 5 minutes
     {"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}


{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}




{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

After step4, no progress made.

Comment 13 aihua liang 2020-03-02 03:04:38 UTC
Hi, Kevin
 
  Please help to check if the patches have been merged into qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae?

BR,
Aliang

Comment 14 Kevin Wolf 2020-03-02 09:38:17 UTC
Can you still successfully exit QEMU? In the case of the bug that I reproduced (and fixed), trying to quit QEMU hangs. You then have to send SIGKILL to get rid of the process. Maybe you actually hit a different bug?

I can have another try reproducing the bug, though my old test case successfully moved forward while the guest kept writing new data. It let it run for maybe half an hour. Without my fix, it would stop moving after a few seconds.

Do you do anything specific in the guest or is it just sitting there idle?

Comment 15 aihua liang 2020-03-02 09:57:05 UTC
(In reply to Kevin Wolf from comment #14)
> Can you still successfully exit QEMU? In the case of the bug that I
> reproduced (and fixed), trying to quit QEMU hangs. You then have to send
> SIGKILL to get rid of the process. Maybe you actually hit a different bug?
> 
> I can have another try reproducing the bug, though my old test case
> successfully moved forward while the guest kept writing new data. It let it
> run for maybe half an hour. Without my fix, it would stop moving after a few
> seconds.
> 
> Do you do anything specific in the guest or is it just sitting there idle?

Hi, Kevin

  When quit vm, qemu still hang.

BR,
Aliang

Comment 18 Kevin Wolf 2020-03-24 17:19:48 UTC
I could finally reproduce the problem even with the patch applied, disabling KVM did the trick for me.

Essentially it's still the same problem as before, except that it's not an operation waiting for itself, but two operations waiting for each other. This means that we need a different approach to fix the problem.

I'm afraid the problem is unlikely to be fixed in 8.2.0 now.

Comment 21 aihua liang 2020-03-31 03:21:22 UTC
Test on qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b, don't hit this issue any more.

Test Steps:
 1. Start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 14336  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create mirror target.
      {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
      {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
      {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
      {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
      {'execute':'job-dismiss','arguments':{'id':'job1'}}
      {'execute':'job-dismiss','arguments':{'id':'job2'}}

 3. Do block mirror with buf-size setting
      {'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}
{"timestamp": {"seconds": 1585624372, "microseconds": 991040}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1585624372, "microseconds": 991170}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}

 4. Query block job status
   {"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 20958085120, "offset": 5651824640, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

 5. After some minutes, query block job status.
   {"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21432827904, "offset": 17471045632, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}

{"timestamp": {"seconds": 1585624544, "microseconds": 116114}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1585624544, "microseconds": 116211}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21475229696, "offset": 21475229696, "speed": 0, "type": "mirror"}}

 6. Complete the block job.
    { "execute": "block-job-complete", "arguments": { "device": "j1" } }
{"return": {}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787230}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787336}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787504}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21475229696, "offset": 21475229696, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787589}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787642}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

 7. Check block info
   (qemu) info block
sn1: /root/sn1 (qcow2)
    Attached to:      /machine/peripheral/image1/virtio-backend
    Cache mode:       writeback

drive_data1: /home/data.qcow2 (qcow2)
    Attached to:      /machine/peripheral/data1/virtio-backend
    Cache mode:       writeback, direct

Will set bug's status to "Verified" after it changed to "ON_QA".

Comment 23 Danilo de Paula 2020-04-06 20:41:17 UTC
Done.

Comment 24 CongLi 2020-04-07 01:09:22 UTC
Thanks Danilo.

Based on comment 21, set this bug to 'VERIFIED'.

Thanks.

Comment 26 errata-xmlrpc 2020-05-05 09:55:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.