Bug 1453093

Summary: "block-job-set-speed" does not take effect when setting it to 1 first and then a normal value like 100000000
Product: Red Hat Enterprise Linux 7 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: Gu Nini <ngu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: chayang, jen, jsnow, juzhang, knoel, lolyu, mrezanin, virt-maint, xfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-01 11:01:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473046, 1558351    

Description Qianqian Zhu 2017-05-22 07:00:36 UTC
Description of problem:
Setting speed does not take effect when setting it to 1 first and then a normal value like 100000000

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-5.el7.x86_64
kernel-3.10.0-640.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Launch guest:
/usr/libexec/qemu-kvm -name 'avocado-vt-vm1' \    
    -vga cirrus \    
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=raw,file=/home/kvm_autotest_root/images/rhel74-64-virtio-scsi.raw \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:ac:ad:ae:af:b0,id=idqKTtyC,vectors=4,netdev=idBAxKKY,bus=pci.0,addr=04 \
    -netdev tap,id=idBAxKKY,vhost=on \
    -m 4096 \
    -smp 4,cores=2,threads=1,sockets=2 \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -vnc :0 \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -qmp tcp::5555,server,nowait
2. Live snapshot:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "/mnt/iscsi/sn1", "format": "qcow2", "mode": "absolute-paths" } }
{"return": {}}
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "/mnt/iscsi/sn2", "format": "qcow2", "mode": "absolute-paths" } }
{"return": {}}
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "/mnt/iscsi/sn3", "format": "qcow2", "mode": "absolute-paths" } }
{"return": {}}

3. Live commit:
{ "execute": "block-commit", "arguments": { "device": "drive_image1", "base": "/mnt/iscsi/sn1", "top": "/mnt/iscsi/sn3"}}

4. Set speed to 1:
{ "execute": "block-job-set-speed", "arguments": { "device": "drive_image1","speed":1}}

5. Set speed to 100000000:
{ "execute": "block-job-set-speed", "arguments": { "device": "drive_image1","speed":100000000}}

6. Query block job:
{ "execute": "query-block-jobs"}

Actual results:
Block job stop increasing for more than half hour:
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}
{ "execute": "query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 2188443648, "offset": 578355200, "paused": false, "speed": 100000000, "ready": false, "type": "commit"}]}


Expected results:
Block job speed 100000000 takes effect.

Additional info:

Comment 2 John Snow 2017-05-25 23:35:30 UTC
The problem is that when speed is set to '1', QEMU transmits a block of data and then realizes it has gone over quota, and so goes to sleep until the average rate/sec would come back under quota, in this case, 20-30 minutes in the future.

When setting the new throttle, we don't recalculate this delay. I'll fix upstream, but I don't believe this to be a regression.

Comment 3 Qianqian Zhu 2017-05-26 06:45:58 UTC
qemu-kvm-rhev-2.6.0-27.el7.x86_64 has no this issue, so this is a regression.

Comment 6 Longxiang Lyu 2017-09-13 07:38:19 UTC
reproduced in block mirror
version info of selected component:
Kernel-3.10.0-709.el7
qemu-kvm-rhev-2.9.0-16.el7_4.6
AMD Opteron(tm) Processor 6344
guest: RHEL7.4

steps:
1. boot up a guest
...
-drive file=/home/test/test.qcow2,format=qcow2,if=none,cache=none,werror=stop,rerror=stop,id=img0 \
-device virtio-blk-pci,bus=pci.0,addr=0x2,drive=img0,id=virtio-disk0,bootindex=1 \
...

2. do block mirror with initial speed set to 100
{ "execute": "drive-mirror", "arguments": { "device": "img0", "target": "/home/test/mirror1.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "on-source-error": "stop", "on-target-error": "stop", "speed": 100 } }

3. increase speed to 1000000000
 { "execute": "block-job-set-speed", "arguments": { "device": "img0", "speed": 1000000000} }

4. query block job status
(qemu) info block-jobs

Actual result:
qemu) info block-jobs 
Type mirror, device img0: Completed 16842752 of 4562747392 bytes, speed limit 1000000000 bytes/s
(qemu) info block-jobs 
Type mirror, device img0: Completed 16842752 of 4562747392 bytes, speed limit 1000000000 bytes/s
(qemu) info block-jobs 
Type mirror, device img0: Completed 16842752 of 4562747392 bytes, speed limit 1000000000 bytes/s
(qemu) info block-jobs 
Type mirror, device img0: Completed 16842752 of 4562747392 bytes, speed limit 1000000000 bytes/s
...

The offset paused for a certain period.

Comment 8 John Snow 2017-12-12 00:26:02 UTC
Due to changes in the way block job sleep is implemented upstream in 2.11, this problem is now pretty easy to fix, so I've posted a patch upstream.

[subtext: this fix will require late 2.11-cycle fixes to backport.]

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg01899.html

Comment 9 John Snow 2018-03-15 20:15:41 UTC
Fixed upstream:

aa9ef2e65bed6a8f1709e0523fc856ec08beb657 blockjob: kick jobs on set-speed

and amended in

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg03956.html

which will be in 2.12.

--js

Comment 10 John Snow 2018-06-12 23:14:27 UTC
Fixed in 2.12

Comment 14 John Snow 2018-06-20 19:32:01 UTC
There was indeed a regression upstream that made the mirror job in particular ignore ratelimits. It has been backported in a proposed fix for an otherwise unrelated BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1572856

The patch in question is:

https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03865.html

Comment 18 errata-xmlrpc 2018-11-01 11:01:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443