Bug 1657369 - RFE: synchronous mirror to prevent a long-running block-job-complete (libvirt)
Summary: RFE: synchronous mirror to prevent a long-running block-job-complete (libvirt)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Han Han
URL:
Whiteboard:
Depends On: 1553234 1657983
Blocks: 1677293
TreeView+ depends on / blocked
 
Reported: 2018-12-07 19:58 UTC by Ademar Reis
Modified: 2019-04-24 12:29 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1553234
: 1677293 (view as bug list)
Environment:
Last Closed: 2019-04-24 12:29:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3375691 None None None 2018-12-07 19:58:17 UTC

Comment 3 Peter Krempa 2019-02-14 09:40:05 UTC
Fishing out useful info from the private description:

Add possibility to use the synchronous mirror job with libvirt. Force-abort is tracked by 1585320.

+++ This bug was initially created as a clone of Bug #1553234 +++

Description of problem:
When an active commit is running and the pivot is not ready yet the attempt to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT will fail as expected, but if the volumes are already synchronised, the event BLOCK_JOB_READY is already emitted and the volumes get out of sync again due to intensive IO operations. The blockJobAbort with the same flags get stuck until the volumes are in sync again and the pivot happen.

Version-Release number of selected component (if applicable):
libvirt-python-3.2.0-3.el7_4.1.x86_64
libvirt-daemon-3.2.0-14.el7_4.7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start volume sync 
2. Wait until the volumes are in sync
3. Run IO intensive operations on the VM and wait until the volumes are out of sync again.
4. Try to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT

Actual results:
The operation hangs and blocks the qemu monitor

Expected results:
The operation fails with VIR_ERR_BLOCK_COPY_ACTIVE

Additional info:

--- Additional comment from Peter Krempa on 2018-03-08 12:54:02 BRT ---

Libvirt really can't do much here since qemu will not treat the mirror as READY after it is ready at one time. We'd need a way to make the mirror not ready again to fix this.

I'm afraid that it will not be easy or even possible.

Moving to qemu.

--- Additional comment from Eric Blake on 2018-03-08 13:53:10 BRT ---

Active-sync mirroring may be what we have to use, although I'm not sure whether it will make it for 2.12 softfreeze.
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg07183.html


--- Additional comment from Eric Blake on 2018-05-10 09:52:05 BRT ---

(In reply to Peter Krempa from comment #9)
> Currently no. But I don't see how that is relevant to this bug. This bug has
> problem with the block-job-complete command which does the opposite of
> block-job-cancel.

block-job-complete is potentially long-running until a synchronous mirror lands (qemu 2.13).  As long as that is the case, a user may desire to immediately cancel a long-running block-job-complete.  To do that, they have to use the new block-job-cancel with "force":true added in qemu 2.12 (and with the additional patch backported to make it not regress block-job-cancel before the job is ready).  So the question is if libvirt should expose the instant cancel option, as long as qemu does not have any other way to prevent a long-running block-job-complete.

Meanwhile, independent of whether we fix block-job-complete from being long-running, any build of qemu 2.12 needs the additional backport of Max' patch that avoids the regression of block-job-cancel called prior to the job being ready (as libvirt DOES use that already). Perhaps we need 3 BZs total:
qemu: implement synchronous mirror (2.13)
qemu: avoid block-job-cancel regression (2.12)
libvirt: use block-job cancel to stop long-running block-job-complete

-

Comment 4 Jaroslav Suchanek 2019-04-24 12:29:04 UTC
This bug is going to be addressed in next major release within existing cloned bug.


Note You need to log in before you can comment on or make changes to this bug.