Fishing out useful info from the private description:
Add possibility to use the synchronous mirror job with libvirt. Force-abort is tracked by 1585320.
+++ This bug was initially created as a clone of Bug #1553234 +++
Description of problem:
When an active commit is running and the pivot is not ready yet the attempt to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT will fail as expected, but if the volumes are already synchronised, the event BLOCK_JOB_READY is already emitted and the volumes get out of sync again due to intensive IO operations. The blockJobAbort with the same flags get stuck until the volumes are in sync again and the pivot happen.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start volume sync
2. Wait until the volumes are in sync
3. Run IO intensive operations on the VM and wait until the volumes are out of sync again.
4. Try to abort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT
The operation hangs and blocks the qemu monitor
The operation fails with VIR_ERR_BLOCK_COPY_ACTIVE
--- Additional comment from Peter Krempa on 2018-03-08 12:54:02 BRT ---
Libvirt really can't do much here since qemu will not treat the mirror as READY after it is ready at one time. We'd need a way to make the mirror not ready again to fix this.
I'm afraid that it will not be easy or even possible.
Moving to qemu.
--- Additional comment from Eric Blake on 2018-03-08 13:53:10 BRT ---
Active-sync mirroring may be what we have to use, although I'm not sure whether it will make it for 2.12 softfreeze.
--- Additional comment from Eric Blake on 2018-05-10 09:52:05 BRT ---
(In reply to Peter Krempa from comment #9)
> Currently no. But I don't see how that is relevant to this bug. This bug has
> problem with the block-job-complete command which does the opposite of
block-job-complete is potentially long-running until a synchronous mirror lands (qemu 2.13). As long as that is the case, a user may desire to immediately cancel a long-running block-job-complete. To do that, they have to use the new block-job-cancel with "force":true added in qemu 2.12 (and with the additional patch backported to make it not regress block-job-cancel before the job is ready). So the question is if libvirt should expose the instant cancel option, as long as qemu does not have any other way to prevent a long-running block-job-complete.
Meanwhile, independent of whether we fix block-job-complete from being long-running, any build of qemu 2.12 needs the additional backport of Max' patch that avoids the regression of block-job-cancel called prior to the job being ready (as libvirt DOES use that already). Perhaps we need 3 BZs total:
qemu: implement synchronous mirror (2.13)
qemu: avoid block-job-cancel regression (2.12)
libvirt: use block-job cancel to stop long-running block-job-complete
This bug is going to be addressed in next major release within existing cloned bug.