1532542 – Blockcopy failed when using --bandwidth option with the values in the normal range

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1532542 - Blockcopy failed when using --bandwidth option with the values in the normal range

Summary: Blockcopy failed when using --bandwidth option with the values in the normal ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Krempa
QA Contact:	yisun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-09 09:45 UTC by Meina Li
Modified:	2018-10-30 09:52 UTC (History)
CC List:	7 users (show)
Fixed In Version:	libvirt-4.3.0-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-30 09:52:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:3113	0	None	None	None	2018-10-30 09:52:59 UTC

Description Meina Li 2018-01-09 09:45:49 UTC

Description of problem:
Blockcopy  failed when using --bandwidth option with the values in the normal range

Version-Release number of selected component (if applicable):
libvirt-3.9.0-7.el7.x86_64
qemu-kvm-rhev-2.10.0-15.el7.x86_64

How reproducible:
100%

Steps to reproduce:
1.  Prepare an transient guest with 'file' disk type.
# virsh dumpxml rhel7 | grep disk -a12
 ...  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/rhel7.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>
...

2. Do blockcopy with --bandwidth option that the value is out of value.
# virsh blockcopy rhel7 vda /tmp/test.copy --transient-job --verbose --wait 175921860444158
error: bandwidth must be less than 17592186044415

3. Do blockcopy with --bandwidth option  with the value in the normal range.
# virsh blockcopy lmn vda /tmp/test.copy --transient-job --verbose --wait 17592186044410
error: internal error: argument key 'speed' must not be negative

Actual results:
As above step 3.

Expected results:
The guest can do blockcopy success.

Comment 2 Peter Krempa 2018-02-19 15:41:37 UTC

The qemu driver has different limits for bandwidth. The following patch reports the actual limit rather than the dubious 'negative speed' error:

commit 8f5133f99e6c5de2b1ef513b325d71e1fbd815c0 
Author: Peter Krempa <pkrempa>
Date:   Mon Feb 19 09:21:29 2018 +0100

    qemu: blockcopy: Add check for bandwidth
    
    QEMU code does not work well with too big numbers on the JSON monitor so
    our monitor code supports sending only numbers up to LLONG_MAX. Avoid a
    weird error message by limiting the size of the 'bandwidth' parameter
    for block copy.

Comment 4 Meina Li 2018-06-22 06:10:38 UTC

Hi，Peter

I found that the error message of limited size is different when I use --transient-job or not. For example:

# virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 175921860444158
error: bandwidth must be less than 17592186044415

# virsh blockcopy test vda /tmp/test.copy --verbose --wait 175921860444158
error: numerical overflow: bandwidth must be less than 8796093022207

And when I set the size to 17592186044414 which is less than 17592186044415.
# virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 17592186044414
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)    

# virsh blockcopy test vda /tmp/test.copy --verbose --wait 17592186044414
error: numerical overflow: bandwidth must be less than 8796093022207

Can you help me to check it? Thank you very much.

Comment 5 yisun 2018-08-17 08:16:41 UTC

due to comment 4, set this one back to assigned for now. pls help to confirm, thx.

Comment 6 Peter Krempa 2018-09-10 12:34:39 UTC

Note that all cases you've found are not relevant to the original bug report.

So the difference in the values is caused by the logic in virsh. If you add '--transient-job' the new API is invoked which uses typed parametes for arguments.

The typed parameter is an unsigned long long and is specified in bytes. In the first case above the error message is reported when the user selected value cannot be transported in the RPC protocol. This is due to the fact that the user uses MiB/s which needs to be converted to bytes/s.

In second case the error is correct, but it's missing the unit. The value is not misleading though since the default is in MiB/s. In case of the second API we don't do any checking since the value is transported in MiB/s, so the qemu driver can apply the limit.

In the third case the error is reported from the qemu driver which has a stricter limitation to signed long long type to pass it to qemu. This further limits the maximum number to the same size as in case 2. This is correct and both values are printed.

Fourth case is the same as second case.

None of those above are bugs. The fact that there are two separate limits applied on the RPC layer and on the qemu level can't be avoided. If we'd decrease the RPC limit, hypervisors which don't have any internal limitation would not be able to use the full potential.

Moving back to ON_QA.

Comment 7 yisun 2018-09-18 10:24:18 UTC

Thx Peter,
Verified on:
qemu-kvm-rhev-2.12.0-13.el7.x86_64
libvirt-4.5.0-9.virtcov.el7.x86_64

[root@ibm-x3250m5-04 ~]# virsh domblklist vm2
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/vm2-1.qcow2
sdb        /var/lib/libvirt/images/test.qcow2


[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 17592186044410
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)


[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022208
error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s)

[root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022207
Block Copy: [100 %]
Now in mirroring phase

(In reply to Peter Krempa from comment #6)
> Note that all cases you've found are not relevant to the original bug report.
> 
> So the difference in the values is caused by the logic in virsh. If you add
> '--transient-job' the new API is invoked which uses typed parametes for
> arguments.
> 
> The typed parameter is an unsigned long long and is specified in bytes. In
> the first case above the error message is reported when the user selected
> value cannot be transported in the RPC protocol. This is due to the fact
> that the user uses MiB/s which needs to be converted to bytes/s.
> 
> In second case the error is correct, but it's missing the unit. The value is
> not misleading though since the default is in MiB/s. In case of the second
> API we don't do any checking since the value is transported in MiB/s, so the
> qemu driver can apply the limit.
> 
> In the third case the error is reported from the qemu driver which has a
> stricter limitation to signed long long type to pass it to qemu. This
> further limits the maximum number to the same size as in case 2. This is
> correct and both values are printed.
> 
> Fourth case is the same as second case.
> 
> None of those above are bugs. The fact that there are two separate limits
> applied on the RPC layer and on the qemu level can't be avoided. If we'd
> decrease the RPC limit, hypervisors which don't have any internal limitation
> would not be able to use the full potential.
> 
> Moving back to ON_QA.

Comment 9 errata-xmlrpc 2018-10-30 09:52:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113

Note You need to log in before you can comment on or make changes to this bug.