Bug 1532542 - Blockcopy failed when using --bandwidth option with the values in the normal range
Summary: Blockcopy failed when using --bandwidth option with the values in the normal ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-09 09:45 UTC by Meina Li
Modified: 2018-10-30 09:52 UTC (History)
7 users (show)

Fixed In Version: libvirt-4.3.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 09:52:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3113 None None None 2018-10-30 09:52:59 UTC

Description Meina Li 2018-01-09 09:45:49 UTC
Description of problem:
Blockcopy  failed when using --bandwidth option with the values in the normal range

Version-Release number of selected component (if applicable):
libvirt-3.9.0-7.el7.x86_64
qemu-kvm-rhev-2.10.0-15.el7.x86_64

How reproducible:
100%

Steps to reproduce:
1.  Prepare an transient guest with 'file' disk type.
# virsh dumpxml rhel7 | grep disk -a12
 ...  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/rhel7.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>
...

2. Do blockcopy with --bandwidth option that the value is out of value.
# virsh blockcopy rhel7 vda /tmp/test.copy --transient-job --verbose --wait 175921860444158
error: bandwidth must be less than 17592186044415

3. Do blockcopy with --bandwidth option  with the value in the normal range.
# virsh blockcopy lmn vda /tmp/test.copy --transient-job --verbose --wait 17592186044410
error: internal error: argument key 'speed' must not be negative

Actual results:
As above step 3.

Expected results:
The guest can do blockcopy success.

Comment 2 Peter Krempa 2018-02-19 15:41:37 UTC
The qemu driver has different limits for bandwidth. The following patch reports the actual limit rather than the dubious 'negative speed' error:

commit 8f5133f99e6c5de2b1ef513b325d71e1fbd815c0 
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Mon Feb 19 09:21:29 2018 +0100

    qemu: blockcopy: Add check for bandwidth
    
    QEMU code does not work well with too big numbers on the JSON monitor so
    our monitor code supports sending only numbers up to LLONG_MAX. Avoid a
    weird error message by limiting the size of the 'bandwidth' parameter
    for block copy.

Comment 4 Meina Li 2018-06-22 06:10:38 UTC
Hi,Peter

I found that the error message of limited size is different when I use --transient-job or not. For example:

# virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 175921860444158
error: bandwidth must be less than 17592186044415

# virsh blockcopy test vda /tmp/test.copy --verbose --wait 175921860444158
error: numerical overflow: bandwidth must be less than 8796093022207

And when I set the size to 17592186044414 which is less than 17592186044415.
# virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 17592186044414
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)    

# virsh blockcopy test vda /tmp/test.copy --verbose --wait 17592186044414
error: numerical overflow: bandwidth must be less than 8796093022207

Can you help me to check it? Thank you very much.

Comment 5 yisun 2018-08-17 08:16:41 UTC
due to comment 4, set this one back to assigned for now. pls help to confirm, thx.

Comment 6 Peter Krempa 2018-09-10 12:34:39 UTC
Note that all cases you've found are not relevant to the original bug report.

So the difference in the values is caused by the logic in virsh. If you add '--transient-job' the new API is invoked which uses typed parametes for arguments.

The typed parameter is an unsigned long long and is specified in bytes. In the first case above the error message is reported when the user selected value cannot be transported in the RPC protocol. This is due to the fact that the user uses MiB/s which needs to be converted to bytes/s.

In second case the error is correct, but it's missing the unit. The value is not misleading though since the default is in MiB/s. In case of the second API we don't do any checking since the value is transported in MiB/s, so the qemu driver can apply the limit.

In the third case the error is reported from the qemu driver which has a stricter limitation to signed long long type to pass it to qemu. This further limits the maximum number to the same size as in case 2. This is correct and both values are printed.

Fourth case is the same as second case.

None of those above are bugs. The fact that there are two separate limits applied on the RPC layer and on the qemu level can't be avoided. If we'd decrease the RPC limit, hypervisors which don't have any internal limitation would not be able to use the full potential.

Moving back to ON_QA.

Comment 7 yisun 2018-09-18 10:24:18 UTC
Thx Peter,
Verified on:
qemu-kvm-rhev-2.12.0-13.el7.x86_64
libvirt-4.5.0-9.virtcov.el7.x86_64

[root@ibm-x3250m5-04 ~]# virsh domblklist vm2
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/vm2-1.qcow2
sdb        /var/lib/libvirt/images/test.qcow2


[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 17592186044410
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)


[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022208
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)

[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022207
Block Copy: [100 %]
Now in mirroring phase

(In reply to Peter Krempa from comment #6)
> Note that all cases you've found are not relevant to the original bug report.
> 
> So the difference in the values is caused by the logic in virsh. If you add
> '--transient-job' the new API is invoked which uses typed parametes for
> arguments.
> 
> The typed parameter is an unsigned long long and is specified in bytes. In
> the first case above the error message is reported when the user selected
> value cannot be transported in the RPC protocol. This is due to the fact
> that the user uses MiB/s which needs to be converted to bytes/s.
> 
> In second case the error is correct, but it's missing the unit. The value is
> not misleading though since the default is in MiB/s. In case of the second
> API we don't do any checking since the value is transported in MiB/s, so the
> qemu driver can apply the limit.
> 
> In the third case the error is reported from the qemu driver which has a
> stricter limitation to signed long long type to pass it to qemu. This
> further limits the maximum number to the same size as in case 2. This is
> correct and both values are printed.
> 
> Fourth case is the same as second case.
> 
> None of those above are bugs. The fact that there are two separate limits
> applied on the RPC layer and on the qemu level can't be avoided. If we'd
> decrease the RPC limit, hypervisors which don't have any internal limitation
> would not be able to use the full potential.
> 
> Moving back to ON_QA.

Comment 9 errata-xmlrpc 2018-10-30 09:52:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113


Note You need to log in before you can comment on or make changes to this bug.