Bug 1553234

Summary:	RFE: synchronous mirror to prevent a long-running block-job-complete (qemu)
Product:	Red Hat Enterprise Linux 7	Reporter:	Roman Hodain <rhodain>
Component:	qemu-kvm-rhev	Assignee:	John Snow <jsnow>
Status:	CLOSED DEFERRED	QA Contact:	aihua liang <aliang>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.4	CC:	chayang, coli, eblake, fjin, jsnow, juzhang, knoel, michen, mrezanin, mtessun, pkrempa, sirao, virt-maint, xfu
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1585320 1657369 1657983 (view as bug list)		Environment:
Last Closed:	2019-05-01 19:33:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1657369, 1657983, 1677293

Description Roman Hodain 2018-03-08 14:50:33 UTC

Description of problem:
When an active commit is running and the pivot is not ready yet the attempt to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT will fail as expected, but if the volumes are already synchronised, the event BLOCK_JOB_READY is already emitted and the volumes get out of sync again due to intensive IO operations. The blockJobAbort with the same flags get stuck until the volumes are in sync again and the pivot happen.

Version-Release number of selected component (if applicable):
libvirt-python-3.2.0-3.el7_4.1.x86_64
libvirt-daemon-3.2.0-14.el7_4.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start volume sync 
2. Wait until the volumes are in sync
3. Run IO intensive operations on the VM and wait until the volumes are out of sync again.
4. Try to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT

Actual results:
The operation hangs and blocks the qemu monitor

Expected results:
The operation fails with VIR_ERR_BLOCK_COPY_ACTIVE

Additional info:

Comment 1 Peter Krempa 2018-03-08 15:54:02 UTC

Libvirt really can't do much here since qemu will not treat the mirror as READY after it is ready at one time. We'd need a way to make the mirror not ready again to fix this.

I'm afraid that it will not be easy or even possible.

Moving to qemu.

Comment 2 Eric Blake 2018-03-08 16:53:10 UTC

Active-sync mirroring may be what we have to use, although I'm not sure whether it will make it for 2.12 softfreeze.
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg07183.html

Comment 3 Eric Blake 2018-03-08 20:45:15 UTC

There's also a patch adding forced block-job-abort; if that lands, libvirt could also be taught to expose that:
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06492.html

Comment 4 Qianqian Zhu 2018-03-12 07:54:42 UTC

Test from qemu side, if perform "block-job-complete" when guest has IO load, qemu does not give "BLOCK_JOB_COMPLETED" event immediately, it will continue to synchronize the data, but the qmp monitor is able to response, and the guest status is "running".

From my understanding, the description in the "Actual results:" from comment 0 means that libvirt hang there for not getting "BLOCK_JOB_COMPLETED", if so, then this issue is reproduced.

And I suppose that the expectation in comment 0 is to move back to unsteady status when qemu still have data need to synchronize. Or from comment 3 it provides a forced block-job-abort. If I understand it correctly, either way would be a FutureFeature, so I will add the keyword. Please correct me if I was wrong.

Version:
qemu-kvm-rhev-2.9.0-16.el7_4.1.x86_64.rpm
kernel-3.10.0-823.el7.x86_64

Steps:
1. Launch guest:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -machine pc  \
    -vga cirrus  \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:ea:eb:ec:ed:ee,id=id8ROQ4i,vectors=4,netdev=idhDEVsz,bus=pci.0,addr=0x4  \
    -netdev tap,id=idhDEVsz \
    -m 2048  \
    -smp 4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -vnc :0  \
    -qmp stdio

2. Block mirror image1:
{"execute": "qmp_capabilities"}
{"execute": "drive-mirror", "arguments": {"mode": "absolute-paths", "format": "qcow2", "device": "drive_image1", "speed": 0, "sync": "full", "target": "/home/kvm_autotest_root/images/target1.qcow2"}}

3. Wait for steady event:
{"timestamp": {"seconds": 1520838758, "microseconds": 367557}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_image1", "len": 4307812352, "offset": 4307812352, "speed": 0, "type": "mirror"}}

4. Execute dd task inside guest:
# dd if=/dev/urandom of=/home/test bs=128k count=1000000000

5. Make sure the "offset" does not match "len", which indicates the blocks are out of sync again:
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 4560781312, "offset": 4341563392, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 4560781312, "offset": 4341563392, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 4673372160, "offset": 4352049152, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}

5. Perform block-job-complete:
{"execute": "block-job-complete", "arguments": {"device": "drive_image1"}}
{"return": {}}

6. Check the sync status again, it still does not complete the synchronization:
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 5056757760, "offset": 4612620288, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 5065146368, "offset": 4623695872, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 5084282880, "offset": 4639883264, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-status"}
{"return": {"status": "running", "singlestep": false, "running": true}}

7. Stop the dd task inside guest, and wait for the synchronization finish:
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 7137787904, "offset": 7077625856, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 7137787904, "offset": 7109083136, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 7137787904, "offset": 7119568896, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"timestamp": {"seconds": 1520839186, "microseconds": 151863}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 7137787904, "offset": 7137787904, "speed": 0, "type": "mirror"}}
{"execute": "query-block-jobs"}
{"return": []}

Result:
After step 5, qemu did not give "BLOCK_JOB_COMPLETED" event immediately, but continued to synchronize the data, but the qmp monitor was able to response, so I suppose the description in the "Actual results:" from comment 0 means that libvirt hang there for not getting "BLOCK_JOB_COMPLETED".
After Step 7. When qemu accomplished all data synchronization, the block job finished with "BLOCK_JOB_COMPLETED"

Comment 5 Roman Hodain 2018-03-13 08:57:28 UTC

> From my understanding, the description in the "Actual results:" from comment 0  means that libvirt hang there for not getting "BLOCK_JOB_COMPLETED", if so, then this issue is reproduced.

Based on the libvirt API documentation 

    https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort

The problem is rather inconsistent behaviour. BlockJobAbort with flag VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT should fail with VIR_ERR_BLOCK_COPY_ACTIVE which happens until the volumes are synced. When the same is called the pivot should happen. If the volumes are out of sync again I would expect to get again VIR_ERR_BLOCK_COPY_ACTIVE instead of waiting for BLOCK_JOB_COMPLETED.

Comment 7 Ademar Reis 2018-05-09 19:11:14 UTC

(In reply to Eric Blake from comment #3)
> There's also a patch adding forced block-job-abort; if that lands, libvirt
> could also be taught to expose that:
> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06492.html

This one did land in QEMU-2.12 (b76e4458). Are there any plans for libvirt to use it?

And BTW, looks like we need this follow-up patch from Max backported:

commit eb36639f7bbc16055e551593b81365e8ae3b0b05
Author: Max Reitz <mreitz>
Date:   Wed May 2 00:05:08 2018 +0200

    block/mirror: Make cancel always cancel pre-READY
    
    Commit b76e4458b1eb3c32e9824fe6aa51f67d2b251748 made the mirror block
    job respect block-job-cancel's @force flag: With that flag set, it would
    now always really cancel, even post-READY.

Comment 9 Peter Krempa 2018-05-10 07:28:38 UTC

(In reply to Ademar Reis from comment #7)
> (In reply to Eric Blake from comment #3)
> > There's also a patch adding forced block-job-abort; if that lands, libvirt
> > could also be taught to expose that:
> > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06492.html
> 
> This one did land in QEMU-2.12 (b76e4458). Are there any plans for libvirt
> to use it?

Currently no. But I don't see how that is relevant to this bug. This bug has problem with the block-job-complete command which does the opposite of block-job-cancel.

> 
> And BTW, looks like we need this follow-up patch from Max backported:
> 
> commit eb36639f7bbc16055e551593b81365e8ae3b0b05
> Author: Max Reitz <mreitz>
> Date:   Wed May 2 00:05:08 2018 +0200
> 
>     block/mirror: Make cancel always cancel pre-READY
>     
>     Commit b76e4458b1eb3c32e9824fe6aa51f67d2b251748 made the mirror block
>     job respect block-job-cancel's @force flag: With that flag set, it would
>     now always really cancel, even post-READY.

Comment 10 Eric Blake 2018-05-10 12:52:05 UTC

(In reply to Peter Krempa from comment #9)
> Currently no. But I don't see how that is relevant to this bug. This bug has
> problem with the block-job-complete command which does the opposite of
> block-job-cancel.

block-job-complete is potentially long-running until a synchronous mirror lands (qemu 2.13).  As long as that is the case, a user may desire to immediately cancel a long-running block-job-complete.  To do that, they have to use the new block-job-cancel with "force":true added in qemu 2.12 (and with the additional patch backported to make it not regress block-job-cancel before the job is ready).  So the question is if libvirt should expose the instant cancel option, as long as qemu does not have any other way to prevent a long-running block-job-complete.

Meanwhile, independent of whether we fix block-job-complete from being long-running, any build of qemu 2.12 needs the additional backport of Max' patch that avoids the regression of block-job-cancel called prior to the job being ready (as libvirt DOES use that already). Perhaps we need 3 BZs total:
qemu: implement synchronous mirror (2.13)
qemu: avoid block-job-cancel regression (2.12)
libvirt: use block-job cancel to stop long-running block-job-complete

Comment 12 Ademar Reis 2018-06-01 21:03:58 UTC

(In reply to Eric Blake from comment #10)
> ... Perhaps we need 3 BZs total:
> qemu: implement synchronous mirror (2.13)

This BZ.

> qemu: avoid block-job-cancel regression (2.12)

Already filled: bug 1572856

> libvirt: use block-job cancel to stop long-running block-job-complete

Bug 1585320

Comment 14 John Snow 2018-12-07 20:53:45 UTC

Hi, it is upstream. I'll see if it's cumbersome to backport to qemu 2.12.0 or not.

It seems like there are two things:

(1) We want synchronous mode for mirror for guaranteed response times during the complete phase, and
(2) We want the ability to cancel quickly at any time if we do not want consistency.

(1) is provided by synchronous mirror,
(2) is provided by cancel --force.

Libvirt needs support for one-or-the-other, or both, as appropriate for the condition. I will backport the synchronous mirror for QEMU for this bug.

Comment 17 Ademar Reis 2019-05-01 19:33:22 UTC

The libvirt part is postponned to RHEL8-AV already and there, the qemu part is already tracked by bug 1644988, so closing.