Bug 1388058

Summary: [ppc64le]Block mirror with firewall,executing the "block-job-cancel" command,job is cancelled immediately
Product: Red Hat Enterprise Linux 7 Reporter: xianwang <xianwang>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: knoel, kwolf, qzhang, virt-maint, xianwang, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-24 04:45:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description xianwang 2016-10-24 10:43:12 UTC
Description of problem:
host1 boot a guest, make a nfs server in host2. launch a block-mirror job,mirror disk to host2 nfs server(remote storage).After execute"block-mirror..." job,please execute"iptables -A INPUT -p tcp --sport 2049 -j REJECT" in host2, then cancel the block-mirror job, but it notice"BLOCK_JOB_CANCELLED" in qmp

Version-Release number of selected component (if applicable):
Host install tree: RHEL7.3-20161012.0
kernel: kernel-3.10.0-513.el7
qemu: qemu-kvm-rhev-2.6.0-27.el7
SLOF: SLOF-20160223-6.gitdbbfda4.el7

Guest: RHEL6.8 BE guest
driveformat: virtio_blk
nicmodel: spapr-vlan
mem: 16G
vcpu: 16

How reproducible:
3/3

Steps to Reproduce:
1.Boot guest with a local system disk
2.Mirror disk to a remote storage with command"{ "execute": "drive-mirror", "arguments": { "device": "drive_data2", "target": "/root/host2/d2-blk1.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }"
3.During mirroring, use firewall(on host1) to stop the job,with command
#iptables -A INPUT -p tcp --sport 2049 -j REJECT
4.Then Cancel the mirror job with command"{ "execute": "block-job-cancel", "arguments": { "device": "drive_data2" } }"
5. Stop the firewall
#iptables -F

Actual results:
4.the job is in canceled quickly
{ "execute": "block-job-cancel", "arguments": { "device": "drive_data2" } }
{"return": {}}
{"timestamp": {"seconds": 1477300868, "microseconds": 103676}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive_data2", "len": 4294967296, "offset": 1614807040, "speed": 0, "type": "mirror"}}
5.the job has been stopped before 
{"execute":"query-block-jobs","arguments":{}}
{"return": []}


Expected results:

4. Cancel the mirror job. The job won't be cancelled immediately
{"return": {}}
5.After executing"#iptables -F",At this point, the job cancelled correctly
{"timestamp": {"seconds": 1477300868, "microseconds": 103676}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive_data2", "len": 4294967296, "offset": 1614807040, "speed": 0, "type": "mirror"}}

Additional info:
x86_64 is correct, don't exist this problem.

Comment 1 xianwang 2016-10-24 10:49:05 UTC
Steps to Reproduce:
1.Boot guest with a local system disk
2.Mirror disk to a remote storage with command
{ "execute": "drive-mirror", "arguments": { "device": "blk1", "target": "/root/host2/m1-blk1.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
3.During mirroring, use firewall(on host1) to stop the job,with command
#iptables -A INPUT -p tcp --sport 2049 -j REJECT
4.check the job status

Expected results:
4.job is stop

Actual results:
4.job is continuing, with the "offset" value is adding
{ "execute": "drive-mirror", "arguments": { "device": "blk1", "target": "/root/host2/m1-blk1.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{"return": {}}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436014592, "offset": 1053425664, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436014592, "offset": 1145765888, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436014592, "offset": 2453602304, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436014592, "offset": 3274702848, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436342272, "offset": 3513974784, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436342272, "offset": 3919642624, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"execute":"query-block-jobs","arguments":{}}
{"return": [{"io-status": "ok", "device": "blk1", "busy": true, "len": 5436342272, "offset": 5007736832, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"timestamp": {"seconds": 1477300220, "microseconds": 403085}, "event": "BLOCK_JOB_READY", "data": {"device": "blk1", "len": 5436342272, "offset": 5436342272, "speed": 0, "type": "mirror"}}

Comment 3 David Gibson 2016-10-30 20:48:25 UTC
I don't really understand the problem.

Why is the fact the job is cancelled quickly a problem?

Comment 4 xianwang 2016-10-31 02:46:58 UTC
(In reply to David Gibson from comment #3)
> I don't really understand the problem.
> 
> Why is the fact the job is cancelled quickly a problem?

hi,David,
During the mirror job, step 3 sets firewall with command "iptables -A INPUT -p tcp --sport 2049 -j REJECT",then, after cancelling the job(as step 4), the job should not be cancelled immediately until stop the firewall

Comment 5 David Gibson 2016-11-18 01:35:19 UTC
Hi Kevin,

Sorry to bother you, but I don't really know the block layer well enough to understand this bug report.  Why is it expected that the block job is not immediately cancelled?  Is there a reason that it's bad for the job not to be immediately cancelled?

Comment 6 Kevin Wolf 2016-11-18 09:00:19 UTC
I don't think so, this claim doesn't make sense to me. If there are still
requests in flight, the mirror block job will wait for them to complete before
it actually cancels, but if it happens that no requests are in flight or they
complete immediately, then it will cancel immediately. I also don't understand
why it would be desirable to have a longer time between the cancel command and
the actual cancellation, shorter is better in my book...

Anyway, the feeling that I get is that we're really just testing the behaviour
of the NFS driver in the kernel when the connection breaks down. There's little
that qemu could do either way.

Comment 7 David Gibson 2016-11-24 04:45:15 UTC
Noted, thanks.