Bug 1572856

Summary: 'block-job-cancel' can not cancel a "drive-mirror" job
Product: Red Hat Enterprise Linux 7 Reporter: Yanan Fu <yfu>
Component: qemu-kvm-rhevAssignee: Hanna Czenczek <hreitz>
Status: CLOSED ERRATA QA Contact: Gu Nini <ngu>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: chayang, juzhang, knoel, michen, ngu, qizhu, qzhang, virt-maint, xianwang, xuma, yfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1596991 (view as bug list) Environment:
Last Closed: 2018-11-01 11:07:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1518738, 1585320, 1596991    

Description Yanan Fu 2018-04-28 07:53:59 UTC
Description of problem:
'block-job-cancel' can not cancel a "drive-mirror" job.
The job will keep running and get "BLOCK_JOB_READY" and "BLOCK_JOB_COMPLETED"
at last.

And this is a regression issue for RHEL7.6

Version-Release number of selected component (if applicable):
qemu: qemu-kvm-rhev-2.12.0-1.el7.x86_64
kernel: kernel-3.10.0-877.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot vm.
2. Execute "driver-mirror" with qmp.
{"execute": "drive-mirror", "arguments": {"mode": "absolute-paths", "format": "qcow2", "device": "drive_image1", "speed": 0, "sync": "full", "target": "/home/kvm_autotest_root/images/target1.qcow2"}}

3.Execute "block-job-cancel" with qmp.
{"execute": "block-job-cancel", "arguments": {"device": "drive_image1"}}

qmp command log:
{"execute": "drive-mirror", "arguments": {"mode": "absolute-paths", "format": 
           ^^^^^^^^^^^^^^^^^
"qcow2", "device": "drive_image1", "speed": 0, "sync": "full", "target": "/home/kvm_autotest_root/images/target1.qcow2"}}
{"return": {}}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 4331208704, "offset": 417923072, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute": "block-job-cancel", "arguments": {"device": "drive_image1"}}
           ^^^^^^^^^^^^^^^^^^^^
{"return": {}}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 4331208704, "offset": 1353318400, 
                                                                ^^^^^^^^^^^^ 
"status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 4331208704, "offset": 1566965760, 
                                                                ^^^^^^^^^^^^^ 
"status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 4331208704, "offset": 3257335808, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"timestamp": {"seconds": 1524901760, "microseconds": 253202}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_image1", "len": 4331208704, 
^^^^^^^^^^^^^^^^^^
"offset": 4331208704, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1524901760, "microseconds": 253509}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 4331208704, 
^^^^^^^^^^^^^^^^^^^^^
"offset": 4331208704, "speed": 0, "type": "mirror"}}





Actual results:
can not cancel the block job.

Expected results:
job should be cancelled, and get "BLOCK_JOB_CANCELLED" event.

Additional info:
qemu command line:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl \
    -device pci-bridge,id=pci_bridge,bus=pci.0,addr=0x3,chassis_nr=1 \
    -device intel-hda,bus=pci.0,addr=0x4 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/avocado_azbh2D/monitor-qmp1-20180427-082517-cs9RLzLn,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_azbh2D/monitor-catch_monitor-20180427-082517-cs9RLzLn,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idJw3ilG  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_azbh2D/serial-serial0-20180427-082517-cs9RLzLn,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x5 \
    -chardev socket,path=/var/tmp/avocado_azbh2D/virtio_port-vs-20180427-082517-cs9RLzLn,nowait,id=idyoBE6X,server \
    -device virtserialport,id=idgzmqQo,name=vs,bus=virtio_serial_pci0.0,chardev=idyoBE6X \
    -object rng-random,filename=/dev/random,id=passthrough-9bzWyTnW \
    -device virtio-rng-pci,id=virtio-rng-pci-MUWns1SF,rng=passthrough-9bzWyTnW,bus=pci.0,addr=0x6  \
    -chardev socket,id=seabioslog_id_20180427-082517-cs9RLzLn,path=/var/tmp/avocado_azbh2D/seabios-20180427-082517-cs9RLzLn,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180427-082517-cs9RLzLn,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -device nec-usb-xhci,id=usb2,bus=pci.0,addr=0x7 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x8 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:8b:8c:8d:8e:8f,id=idcutDVH,vectors=4,netdev=idS6OpFS,bus=pci.0,addr=0x9  \
    -netdev tap,id=idS6OpFS,vhost=on \
    -m 4096  \
    -smp 8,cores=4,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb2.0,port=1  \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off  \
    -no-hpet \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa \
    -monitor stdio \

Comment 3 Yanan Fu 2018-04-28 08:01:55 UTC
One simpler bz during RHEL7.3 bz1404137, and didn't hit this problem during rhel7.5.

Comment 4 Hanna Czenczek 2018-05-01 19:58:41 UTC
Broken (?) since commit b76e4458b1eb3c32e9824fe6aa51f67d2b251748.  I’m not sure yet whether that was intentional, but in any case this behavior is not reflected by the description of block-job-cancel.

The description now states that you have to set the @force flag post-READY so that the job is cancelled – without, the job will complete normally.  (Before that patch, setting the flag did not have any effect in this case.)

But now you have to always set @force to cancel the job, independently of whether that was before READY or not.  I think that is not intended, not least because it is an incompatible change.  (And it is not documented, so that’s even more reason to believe it wasn’t intentional.)

So I will send a patch to revert the pre-READY behavior and we’ll see.

Max

Comment 5 Hanna Czenczek 2018-05-01 22:21:21 UTC
Sent an upstream series: http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00143.html

Comment 6 Qunfang Zhang 2018-05-28 02:34:44 UTC
Same issue happened in kvm acceptance test on Power:

kernel-3.10.0-887.el7.ppc64le
qemu-kvm-rhev-2.12.0-2.el7.ppc64le

Comment 7 Ademar Reis 2018-06-01 21:01:05 UTC
Merged upstream, pending backport:

commit eb36639f7bbc16055e551593b81365e8ae3b0b05
Author: Max Reitz <mreitz>
Date:   Wed May 2 00:05:08 2018 +0200

    block/mirror: Make cancel always cancel pre-READY
    
    Commit b76e4458b1eb3c32e9824fe6aa51f67d2b251748 made the mirror block
    job respect block-job-cancel's @force flag: With that flag set, it would
    now always really cancel, even post-READY.
    
    Unfortunately, it had a side effect: Without that flag set, it would now
    never cancel, not even before READY.  Considering that is an
    incompatible change and not noted anywhere in the commit or the
    description of block-job-cancel's @force parameter, this seems
    unintentional and we should revert to the previous behavior, which is to
    immediately cancel the job when block-job-cancel is called before source
    and target are in sync (i.e. before the READY event).
    
    Cc: qemu-stable
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1572856
    Reported-by: Yanan Fu <yfu>
    Signed-off-by: Max Reitz <mreitz>
    Reviewed-by: Eric Blake <eblake>
    Message-id: 20180501220509.14152-2-mreitz
    Reviewed-by: Jeff Cody <jcody>
    Signed-off-by: Jeff Cody <jcody>

Comment 9 Miroslav Rezanina 2018-06-25 14:14:46 UTC
Fix included in qemu-kvm-rhev-2.12.0-5.el7

Comment 11 Xujun Ma 2018-06-27 03:22:51 UTC
The bug didn't appear in the qemu-kvm-rhev-2.12.0-5.el7 acceptance testing on ppc.

Comment 12 Yanan Fu 2018-06-27 03:48:07 UTC
qemu-kvm-rhev-2.12.0-5.el7 acceptance test on x86_64 didn't hit this issue too.

Comment 13 Gu Nini 2018-06-28 07:06:31 UTC
Verify the bug on following sw versions:
Host kernel: 3.10.0-915.el7.x86_64
qemu-kvm-rhev: qemu-kvm-rhev-2.12.0-5.el7.x86_64


# nc -U /var/tmp/avocado1
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 12, "major": 2}, "package": "qemu-kvm-rhev-2.12.0-5.el7"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "drive_image1", "target": "mirror1", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{"return": {}}
{"execute": "block-job-cancel", "arguments": {"device": "drive_image1"}}
{"return": {}}
{"timestamp": {"seconds": 1530158982, "microseconds": 498950}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive_image1", "len": 15066529792, "offset": 1113980928, "speed": 0, "type": "mirror"}}

......
{ "execute": "drive-mirror", "arguments": { "device": "drive_image1", "target": "mirror1", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{"return": {}}
{"execute": "query-block-jobs"} 
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 15275786240, "offset": 2719809536, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"timestamp": {"seconds": 1530168785, "microseconds": 973943}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_image1", "len": 15277293568, "offset": 15277293568, "speed": 0, "type": "mirror"}}
{"execute": "block-job-complete", "arguments": { "device": "drive_image1"} }
{"return": {}}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 18223529984, "offset": 16170024960, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 18840616960, "offset": 16245391360, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute":"block-job-cancel","arguments":{"device":"drive_image1"}}
{"return": {}}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 22204579840, "offset": 16752508928, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute": "query-block-jobs"}
{"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_image1", "auto-dismiss": true, "busy": true, "len": 22683320320, "offset": 16811229184, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
{"execute":"block-job-cancel","arguments":{"device":"drive_image1","force":true}}
{"return": {}}
{"timestamp": {"seconds": 1530169235, "microseconds": 839978}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive_image1", "len": 24414584832, "offset": 17069113344, "speed": 0, "type": "mirror"}}

Comment 14 errata-xmlrpc 2018-11-01 11:07:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443