Bug 1032873

Summary: block-job-cancel can not cancel current job when drive-mirror to a no enough space libiscsi disk
Product: Red Hat Enterprise Linux 7 Reporter: Jun Li <juli>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: areis, hhuang, juzhang, michen, mrezanin, rbalakri, virt-maint, xfu, xuhan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu v2.8.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1206107 (view as bug list) Environment:
Last Closed: 2017-08-01 23:27:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1206107    

Description Jun Li 2013-11-21 05:38:58 UTC
Description of problem:
Drvie-mirror to a no enough space libiscsi disk. When hit "BLOCK_JOB_ERROR" error, execute  block-job-cancel to cancel the job. But block-job-cancel can not cancel the job and it will give an error "The block job for device 'drive-scsi0-0-0' is currently paused".

Version-Release number of selected component (if applicable):
libiscsi-1.9.0-3.el7.x86_64
qemu-kvm-rhev-1.5.3-19.el7.x86_64
Guest kernel:
2.6.32-430.el6.i686
Host kernel:
3.10.0-48.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with cli as followings:
# /usr/libexec/qemu-kvm -S -M pc-i440fx-rhel7.0.0 -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa68 -rtc base=localtime,clock=host,driftfix=slew \
-device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=iscsi://10.66.6.82:3260/iqn.2013-11.com.example:storage.disk1.juli.xyz/1,if=none,id=drive-scsi0-0-0,media=disk,cache=none,format=qcow2,werror=stop,rerror=stop,aio=native  -device scsi-hd,drive=drive-scsi0-0-0,bus=scsi0.0,scsi-id=0,lun=0,id=juli,bootindex=0 \
-drive file=iscsi://10.66.6.82:3260/iqn.2013-11.com.example:storage.disk1.juli.xyz/3,if=none,id=drive-virtio0-0-0,media=disk,werror=stop,rerror=stop,cache=none,format=qcow2 -device virtio-blk-pci,bus=pci.0,drive=drive-virtio0-0-0,id=virtio0-0-0 \
-drive file=/home/ISO/en_windows_8.1_preview_x86_dvd_2358833.iso,if=none,media=cdrom,format=raw,aio=native,id=drive-ide1-0-0 -device ide-drive,drive=drive-ide1-0-0,id=ide1-0-0,bus=ide.0,unit=0,bootindex=4 \
-device virtio-balloon-pci,id=ballooning \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \
-k en-us -boot menu=on,reboot-timeout=-1,strict=on -qmp tcp:0:4477,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :3 -spice port=5939,disable-ticketing  -vga qxl -global qxl-vga.revision=3 -monitor stdio -monitor tcp:0:7777,server,nowait -monitor unix:/tmp/monitor1,server,nowait \
-netdev tap,id=tap1,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,ifname=tap-juli -device virtio-net-pci,netdev=tap1,id=nic1,mq=on,vectors=17,mac=1a:59:0a:4b:aa:94
2.create a empty qcow2_v3 libiscsi disk with no enough space(e.g: 100M) for drive-mirror.
# qemu-img create -f qcow2 -o compat=1.1 iscsi://10.66.6.82:3260/iqn.2013-11.com.example:storage.disk1.juli.xyz/4 100M

3.execute drive-mirror to mirror guest system image to this empty libiscsi disk via qmp.
{ "execute": "drive-mirror", "arguments": { "device": "drive-scsi0-0-0", "target": "iscsi://10.66.6.82:3260/iqn.2013-11.com.example:storage.disk1.juli.xyz/4", "format": "qcow2", "mode": "existing", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
4.After step 3, qmp will give a "BLOCK_JOB_ERROR" error, then execute block-job-cancel.
{"timestamp": {"seconds": 1385011057, "microseconds": 593340}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi0-0-0", "operation": "write", "action": "stop"}}
{ "execute": "block-job-cancel", "arguments": { "device": "drive-scsi0-0-0" } }

Actual results:
After step 4, qmp will give an error:
{"error": {"class": "GenericError", "desc": "The block job for device 'drive-scsi0-0-0' is currently paused"}}

Expected results:
After step 4, could cancel the job.

Additional info:
After step 4, also do the following operations.
{ "execute" : "query-block-jobs", "arguments" : {} }
{"return": [{"io-status": "failed", "device": "drive-scsi0-0-0", "busy": false, "len": 32212254720, "offset": 26715226112, "paused": true, "speed": 1000000000, "type": "mirror"}]}
{"execute": "block-job-resume", "arguments": { "device": "drive-scsi0-0-0"} }
{"return": {}}
{"timestamp": {"seconds": 1385012058, "microseconds": 741003}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi0-0-0", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1385012058, "microseconds": 748857}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi0-0-0", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1385012058, "microseconds": 907579}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi0-0-0", "operation": "write", "action": "stop"}}

As the above operations, the current job is in paused status, and can't not resume or cancel this job.

Comment 9 Jeff Cody 2016-04-28 18:49:42 UTC
*** Bug 1206107 has been marked as a duplicate of this bug. ***

Comment 10 Jeff Cody 2017-01-26 22:43:58 UTC
The block job can be forced cancel, by adding the option "force":true, e.g.:


{ "execute": "block-job-cancel", 
      "arguments": { 
           "device": "drive-scsi0-0-0",
            "force":true
       }
}


I'm not sure if this was tried in the originally reported version of qemu or not, and perhaps the option was broken; either way, it works with the current upstream QEMU and with v2.8.0 if you pass in "force":true. 

Verified working on upstream QEMU v2.8.0 and master, here are the relevant QAPI commands and responses:

{ "execute": "qmp_capabilities" }
{"return": {}}

{
    "arguments": {
        "device": "virtio0",
        "format": "qcow2",
        "mode": "absolute-paths",
        "sync": "full",
        "on-source-error": "stop",
        "on-target-error": "stop",
        "target": "iscsi://192.168.15.180/iqn.2017-01.com.quasiquark:for.all/2"
    },
    "execute": "drive-mirror"
}

Formatting 'iscsi://192.168.15.180/iqn.2017-01.com.quasiquark:for.all/2', fmt=qcow2 size=272730423296 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}}
qemu-system-x86_64: iSCSI Failure: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:LBA_OUT_OF_RANGE(0x2100)
{"timestamp": {"seconds": 1485469318, "microseconds": 327399}, "event": "BLOCK_JOB_ERROR", "data": {"device": "virtio0", "operation": "write", "action": "stop"}}

{ "execute" : "query-block-jobs" }
{"return": [{"io-status": "nospace", "device": "virtio0", "busy": false, "len": 4023386112, "offset": 9961472, "paused": true, "speed": 0, "ready": false, "type": "mirror"}]}


First without the force option:
{ "execute": "block-job-cancel", "arguments": { "device": "virtio0" } }
{"error": {"class": "GenericError", "desc": "The block job for device 'virtio0' is currently paused"}}


Now with the force option:
{ "execute": "block-job-cancel", "arguments": { "device": "virtio0", "force":true } }
{"return": {}}
{"timestamp": {"seconds": 1485470262, "microseconds": 134118}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "virtio0", "len": 4023386112, "offset": 9961472, "speed": 0, "type": "mirror"}}

{ "execute" : "query-block-jobs" }
{"return": []}



I am moving this to POST with a fixed-in-version of 2.8.0 to be safe, so that it can be retested by QE.

Comment 12 Qianqian Zhu 2017-03-09 09:27:17 UTC
Verified with:
qemu-kvm-rhev-2.8.0-5.el7.x86_64
kernel-3.10.0-566.el7.x86_64

Currently, the machinism of block job is a bit different, It will not cancel the block job when it hits no enough space error, the job is simply pause and qmp will prompt errors, wait for space being extended. So this bug actually not exist now. And I have tried cancel block job with "force=true" option after it is paused, it works.

Steps and result:
1. Launch guest with:
 /usr/libexec/qemu-kvm -cpu SandyBridge  -m 4G -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=iscsi://10.66.8.116:3260/iqn.2014-09.org.openstack:my-iscsi-volume/1,if=none,id=drive-scsi0-0-0,media=disk,cache=none,format=raw,werror=stop,rerror=stop,aio=native -qmp tcp:0:5555,server,nowait   -spice port=5939,disable-ticketing  -vga qxl -global qxl-vga.revision=3 -monitor stdio
2. Drive mirror to a small iscsi disk:
{ "execute": "drive-mirror", "arguments": { "device": "drive-scsi0-0-0", "target": "iscsi://10.66.8.116:3260/iqn.2014-09.org.openstack:my-iscsi-volume/2", "format": "raw", "mode": "existing", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
3. Cancel block job when after qmp prompt error:
1) Without option "force=true"
{ "execute": "block-job-cancel", "arguments": { "device": "drive-scsi0-0-0"} }
2) With option force=true":
{ "execute": "block-job-cancel", "arguments": { "device": "drive-scsi0-0-0", "force":true } }

Result:
After step2:
qmp prompt error, and block job is paused.
{"timestamp": {"seconds": 1489051314, "microseconds": 833099}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi0-0-0", "operation": "write", "action": "stop"}}

After step 3.1:
block job can't be cancelled:
{"error": {"class": "GenericError", "desc": "The block job for device 'drive-scsi0-0-0' is currently paused"}}

After step 3.2:
Block job is cancelled successfully:
{"timestamp": {"seconds": 1489050854, "microseconds": 825732}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive-scsi0-0-0", "len": 21474836480, "offset": 524288000, "speed": 1000000000, "type": "mirror"}}

In conclusion, this bug is fixed, moving to verified.

Comment 14 errata-xmlrpc 2017-08-01 23:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 01:04:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 01:56:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 02:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 18 errata-xmlrpc 2017-08-02 03:02:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 19 errata-xmlrpc 2017-08-02 03:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392