Bug 1794692
| Summary: | Mirror block job stops making progress | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Kevin Wolf <kwolf> |
| Component: | qemu-kvm | Assignee: | Kevin Wolf <kwolf> |
| qemu-kvm sub component: | Block Jobs | QA Contact: | aihua liang <aliang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aliang, coli, ddepaula, dyuan, hhan, jsnow, juzhang, kanderso, knoel, kwolf, matt.s, mtessun, rbalakri, virt-maint, xuzhang, yduan, yisun |
| Version: | 8.0 | Keywords: | Automation, Regression, TestBlocker |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1655367 | Environment: | |
| Last Closed: | 2020-05-05 09:55:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1771318 | ||
|
Description
Kevin Wolf
2020-01-24 11:03:05 UTC
So I can reproduce a mirror hang, even in upstream, with a much simpler setup (just a local qcow2 file on tmpfs, no iscsi or luks involved).
I'm not entirely sure if it's exactly the same problem, but the symptom is that query-block-jobs doesn't increase 'offset' or 'len' any more even though the guest continues to write. Trying to exit QEMU with Ctrl-C makes the whole process hang because it tries to cancel the job synchronously, but the job just won't move.
The job coroutine at that point looks like this:
#0 0x0000561c1624e530 in qemu_coroutine_switch (from_=from_@entry=0x561c17454100, to_=to_@entry=0x7f2343609538, action=action@entry=COROUTINE_YIELD)
at util/coroutine-ucontext.c:218
#1 0x0000561c1624d251 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2 0x0000561c1624d8b5 in qemu_co_queue_wait_impl (queue=0x561c1745f990, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
#3 0x0000561c16193bb4 in mirror_wait_for_any_operation (active=false, s=0x561c173f5550) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
#4 0x0000561c16193bb4 in mirror_wait_for_free_in_flight_slot (s=0x561c173f5550) at block/mirror.c:307
#5 0x0000561c16193bb4 in mirror_run (job=0x561c173f5550, errp=<optimized out>) at block/mirror.c:974
#6 0x0000561c1614432f in job_co_entry (opaque=0x561c173f5550) at job.c:878
#7 0x0000561c1624e5ab in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#8 0x00007f234da19250 in __start_context () at /lib64/libc.so.6
Some mirroring operation is in flight and doesn't complete. Will have to continue debuggging to find out which operation and where it is hanging.
Turns out there is a mirror operation that actually waits for itself to complete:
(gdb) qemu coroutine 0x562a17afc3b0
#0 0x0000562a1656cde0 in qemu_coroutine_switch (from_=from_@entry=0x562a17afc3b0, to_=to_@entry=0x7f7891344538, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
#1 0x0000562a1656bb01 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2 0x0000562a1656c165 in qemu_co_queue_wait_impl (queue=0x562a1868aed0, lock=lock@entry=0x0) at util/qemu-coroutine-lock.c:56
#3 0x0000562a164b168c in mirror_wait_for_any_operation (active=false, s=0x562a18a72e00) at /home/kwolf/source/qemu/include/qemu/lockable.h:46
#4 0x0000562a164b168c in mirror_wait_for_free_in_flight_slot (s=0x562a18a72e00) at block/mirror.c:308
#5 0x0000562a164b168c in mirror_co_read (opaque=0x562a1868ae80) at block/mirror.c:351
#6 0x0000562a1656ce5b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:115
#7 0x00007f789b754250 in __start_context () at /lib64/libc.so.6
#8 0x00007ffedcbebdf0 in ()
#9 0x0000000000000000 in ()
(gdb) macro define offsetof(_type, _memb) \
((long)(&((_type *)0)->_memb))
(gdb) macro define container_of(_ptr, _type, _memb) \
((_type *)((void *)(_ptr) - offsetof(_type, _memb)))
(gdb) p container_of ((CoQueue *)0x562a1868aed0, MirrorOp, waiting_requests)
$12 = (MirrorOp *) 0x562a1868ae80
(gdb) p *$12
$13 = {s = 0x562a18a72e00, qiov = {iov = 0x0, niov = 0, {{nalloc = 0, local_iov = {iov_base = 0x0, iov_len = 0}}, {__pad = '\000' <repeats 11 times>, size = 0}}}, offset = 32768000,
bytes = 655360, bytes_handled = 0x7f785877ee80, is_pseudo_op = false, is_active_write = false, waiting_requests = {entries = {sqh_first = 0x562a17afc3b0, sqh_last = 0x562a18e1e638}},
co = 0x562a17afc3b0, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x562a18a73070}}}
The last print shows that the MirrorOp that owns the CoQueue we're waiting for is running in the same coroutine that is waiting, so it's waiting for itself.
Making sure that mirror_wait_for_any_operation() picks a different operation than its caller to wait for seems to fix the problem.
Yes, Kevin, the bug was found by auto at first, but it can be reproduced easily by manual.
Test on qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904 with backend:localfs, do block mirror with buf-size setting, the job still hang.
Reproduce steps:
1. Start guest with qemu cmds:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-m 14336 \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2 \
-cpu 'EPYC',+kvm_pv_unhalt \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idy8YPXp \
-chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
-device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
-blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
-device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0 \
-netdev tap,id=idNnpx5D,vhost=on \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-monitor stdio \
-qmp tcp:0:3000,server,nowait \
2. Create mirror target.
{'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
{'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
{'execute':'job-dismiss','arguments':{'id':'job1'}}
{'execute':'job-dismiss','arguments':{'id':'job2'}}
3. Do block mirror with buf-size setting
{'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}
4. Check block job status during mirroring
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21463105536, "offset": 3997696, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
5. Wait for some minutes, check block job status.
{"execute":"query-block-jobs"}
After step5, qemu hang.
(gdb) bt full
#0 0x00007fdef4b71016 in __GI_ppoll (fds=0x56417ed39b40, nfds=11, timeout=<optimized out>, timeout@entry=0x7ffd93b0b600, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
resultvar = 18446744073709551102
sc_cancel_oldtype = 0
tval = {tv_sec = 1, tv_nsec = 84896464}
#1 0x000056417c9fd2a5 in ppoll (__ss=0x0, __timeout=0x7ffd93b0b600, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
ts = {tv_sec = 2, tv_nsec = 46228379}
tvsec = <optimized out>
#2 0x000056417c9fd2a5 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=2046228379) at util/qemu-timer.c:348
ts = {tv_sec = 2, tv_nsec = 46228379}
tvsec = <optimized out>
#3 0x000056417c9fe145 in os_host_main_loop_wait (timeout=2046228379) at util/main-loop.c:237
context = 0x56417edfce40
ret = <optimized out>
mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x56417edfbe00}
ret = <optimized out>
timeout_ns = <optimized out>
#4 0x000056417c9fe145 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x56417edfbe00}
ret = <optimized out>
timeout_ns = <optimized out>
#5 0x000056417c7e0841 in main_loop () at vl.c:1828
#6 0x000056417c68cc72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504
i = <optimized out>
snapshot = 0
linux_boot = <optimized out>
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = <optimized out>
boot_order = 0x56417ed77650 "cdn"
boot_once = <optimized out>
ds = <optimized out>
opts = <optimized out>
machine_opts = <optimized out>
icount_opts = <optimized out>
accel_opts = 0x0
olist = <optimized out>
optind = 75
optarg = 0x7ffd93b0d47b "tcp:0:3000,server,nowait"
loadvm = 0x0
--Type <RET> for more, q to quit, c to continue without paging--
machine_class = 0x56417eda95e0
cpu_option = 0x7ffd93b0cc03 "EPYC,+kvm_pv_unhalt"
vga_model = 0x0
qtest_chrdev = 0x0
qtest_log = 0x0
incoming = 0x0
userconfig = <optimized out>
nographic = false
display_remote = <optimized out>
log_mask = <optimized out>
log_file = <optimized out>
trace_file = <optimized out>
maxram_size = <optimized out>
ram_slots = 0
vmstate_dump_file = 0x0
main_loop_err = 0x0
err = 0x0
list_data_dirs = false
dir = <optimized out>
bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffd93b0b7a0}
plugin_list = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x7ffd93b0b7b0}}
__func__ = "main"
Correct step5 and test result in comment4: 5. Wait for some minutes, quit vm (qemu) quit. After step5, qemu hang with gdb info: (gdb) bt full #0 0x00007fb14126b016 in __GI_ppoll (fds=0x559551a9c230, nfds=1, timeout=<optimized out>, timeout@entry=0x7ffdb4c5b6e0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 resultvar = 18446744073709551102 sc_cancel_oldtype = 0 tval = {tv_sec = 423, tv_nsec = 208227472} #1 0x000055955000c2a5 in ppoll (__ss=0x0, __timeout=0x7ffdb4c5b6e0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 ts = {tv_sec = 426, tv_nsec = 404369607} tvsec = <optimized out> #2 0x000055955000c2a5 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:348 ts = {tv_sec = 426, tv_nsec = 404369607} tvsec = <optimized out> #3 0x000055955000e2a4 in aio_poll (ctx=0x559551b1f9d0, blocking=blocking@entry=true) at util/aio-posix.c:669 node = <optimized out> i = <optimized out> ret = 0 progress = <optimized out> timeout = 426404369607 start = 0 __PRETTY_FUNCTION__ = "aio_poll" #4 0x000055954ff89b75 in bdrv_drain_all_begin () at block/io.c:630 waited_ = <optimized out> ctx_ = 0x0 bs = <optimized out> __PRETTY_FUNCTION__ = "bdrv_drain_all_begin" __PRETTY_FUNCTION__ = "bdrv_drain_all_begin" #5 0x000055954ff89b75 in bdrv_drain_all_begin () at block/io.c:595 __PRETTY_FUNCTION__ = "bdrv_drain_all_begin" #6 0x000055954fc9bc81 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4523 i = <optimized out> snapshot = 0 linux_boot = <optimized out> initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = <optimized out> boot_order = 0x559551a9b650 "cdn" boot_once = <optimized out> ds = <optimized out> opts = <optimized out> machine_opts = <optimized out> icount_opts = <optimized out> --Type <RET> for more, q to quit, c to continue without paging-- accel_opts = 0x0 olist = <optimized out> optind = 75 optarg = 0x7ffdb4c5e422 "tcp:0:3000,server,nowait" loadvm = 0x0 machine_class = 0x559551acd5e0 cpu_option = 0x7ffdb4c5dbaa "EPYC,+kvm_pv_unhalt" vga_model = 0x0 qtest_chrdev = 0x0 qtest_log = 0x0 incoming = 0x0 userconfig = <optimized out> nographic = false display_remote = <optimized out> log_mask = <optimized out> log_file = <optimized out> trace_file = <optimized out> maxram_size = <optimized out> ram_slots = 0 vmstate_dump_file = 0x0 main_loop_err = 0x0 err = 0x0 list_data_dirs = false dir = <optimized out> bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffdb4c5b8c0} plugin_list = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, tql_prev = 0x7ffdb4c5b8d0}} __func__ = "main" QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Test on qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae, still hit this issue.
1. Start guest with qemu cmds:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-m 14336 \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2 \
-cpu 'EPYC',+kvm_pv_unhalt \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idy8YPXp \
-chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
-device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
-blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
-device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0 \
-netdev tap,id=idNnpx5D,vhost=on \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-monitor stdio \
-qmp tcp:0:3000,server,nowait \
2. Create mirror target.
{'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
{'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
{'execute':'job-dismiss','arguments':{'id':'job1'}}
{'execute':'job-dismiss','arguments':{'id':'job2'}}
3. Do block mirror with buf-size setting
{'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}
{"timestamp": {"seconds": 1583116751, "microseconds": 587039}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1583116751, "microseconds": 587152}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
4. Check block jobs every 5 minutes
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21469528064, "offset": 2555904, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
After step4, no progress made.
Hi, Kevin Please help to check if the patches have been merged into qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae? BR, Aliang Can you still successfully exit QEMU? In the case of the bug that I reproduced (and fixed), trying to quit QEMU hangs. You then have to send SIGKILL to get rid of the process. Maybe you actually hit a different bug? I can have another try reproducing the bug, though my old test case successfully moved forward while the guest kept writing new data. It let it run for maybe half an hour. Without my fix, it would stop moving after a few seconds. Do you do anything specific in the guest or is it just sitting there idle? (In reply to Kevin Wolf from comment #14) > Can you still successfully exit QEMU? In the case of the bug that I > reproduced (and fixed), trying to quit QEMU hangs. You then have to send > SIGKILL to get rid of the process. Maybe you actually hit a different bug? > > I can have another try reproducing the bug, though my old test case > successfully moved forward while the guest kept writing new data. It let it > run for maybe half an hour. Without my fix, it would stop moving after a few > seconds. > > Do you do anything specific in the guest or is it just sitting there idle? Hi, Kevin When quit vm, qemu still hang. BR, Aliang I could finally reproduce the problem even with the patch applied, disabling KVM did the trick for me. Essentially it's still the same problem as before, except that it's not an operation waiting for itself, but two operations waiting for each other. This means that we need a different approach to fix the problem. I'm afraid the problem is unlikely to be fixed in 8.2.0 now. Test on qemu-kvm-4.2.0-17.module+el8.2.0+6129+b14d477b, don't hit this issue any more.
Test Steps:
1. Start guest with qemu cmds:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-m 14336 \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2 \
-cpu 'EPYC',+kvm_pv_unhalt \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idy8YPXp \
-chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3 \
-device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
-blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
-device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0 \
-netdev tap,id=idNnpx5D,vhost=on \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-monitor stdio \
-qmp tcp:0:3000,server,nowait \
2. Create mirror target.
{'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
{'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
{'execute':'job-dismiss','arguments':{'id':'job1'}}
{'execute':'job-dismiss','arguments':{'id':'job2'}}
3. Do block mirror with buf-size setting
{'execute':'blockdev-mirror','arguments':{'device':'drive_image1','target':'sn1','sync':'full','granularity':65536,'buf-size':655360,'job-id':'j1'}}
{"timestamp": {"seconds": 1585624372, "microseconds": 991040}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1585624372, "microseconds": 991170}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
4. Query block job status
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 20958085120, "offset": 5651824640, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
5. After some minutes, query block job status.
{"execute":"query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "j1", "auto-dismiss": true, "busy": true, "len": 21432827904, "offset": 17471045632, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"timestamp": {"seconds": 1585624544, "microseconds": 116114}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1585624544, "microseconds": 116211}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21475229696, "offset": 21475229696, "speed": 0, "type": "mirror"}}
6. Complete the block job.
{ "execute": "block-job-complete", "arguments": { "device": "j1" } }
{"return": {}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787230}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787336}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787504}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21475229696, "offset": 21475229696, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787589}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1585624557, "microseconds": 787642}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}
7. Check block info
(qemu) info block
sn1: /root/sn1 (qcow2)
Attached to: /machine/peripheral/image1/virtio-backend
Cache mode: writeback
drive_data1: /home/data.qcow2 (qcow2)
Attached to: /machine/peripheral/data1/virtio-backend
Cache mode: writeback, direct
Will set bug's status to "Verified" after it changed to "ON_QA".
Done. Thanks Danilo. Based on comment 21, set this bug to 'VERIFIED'. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 |