Bug 1532542
| Summary: | Blockcopy failed when using --bandwidth option with the values in the normal range | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Meina Li <meili> |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
| Status: | CLOSED ERRATA | QA Contact: | yisun |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.5 | CC: | hhan, jdenemar, jiyan, lmen, pkrempa, xuzhang, yisun |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-4.3.0-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-30 09:52:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The qemu driver has different limits for bandwidth. The following patch reports the actual limit rather than the dubious 'negative speed' error:
commit 8f5133f99e6c5de2b1ef513b325d71e1fbd815c0
Author: Peter Krempa <pkrempa>
Date: Mon Feb 19 09:21:29 2018 +0100
qemu: blockcopy: Add check for bandwidth
QEMU code does not work well with too big numbers on the JSON monitor so
our monitor code supports sending only numbers up to LLONG_MAX. Avoid a
weird error message by limiting the size of the 'bandwidth' parameter
for block copy.
Hiļ¼Peter I found that the error message of limited size is different when I use --transient-job or not. For example: # virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 175921860444158 error: bandwidth must be less than 17592186044415 # virsh blockcopy test vda /tmp/test.copy --verbose --wait 175921860444158 error: numerical overflow: bandwidth must be less than 8796093022207 And when I set the size to 17592186044414 which is less than 17592186044415. # virsh blockcopy test vda /tmp/test.copy --verbose --wait --transient-job 17592186044414 error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s) # virsh blockcopy test vda /tmp/test.copy --verbose --wait 17592186044414 error: numerical overflow: bandwidth must be less than 8796093022207 Can you help me to check it? Thank you very much. due to comment 4, set this one back to assigned for now. pls help to confirm, thx. Note that all cases you've found are not relevant to the original bug report. So the difference in the values is caused by the logic in virsh. If you add '--transient-job' the new API is invoked which uses typed parametes for arguments. The typed parameter is an unsigned long long and is specified in bytes. In the first case above the error message is reported when the user selected value cannot be transported in the RPC protocol. This is due to the fact that the user uses MiB/s which needs to be converted to bytes/s. In second case the error is correct, but it's missing the unit. The value is not misleading though since the default is in MiB/s. In case of the second API we don't do any checking since the value is transported in MiB/s, so the qemu driver can apply the limit. In the third case the error is reported from the qemu driver which has a stricter limitation to signed long long type to pass it to qemu. This further limits the maximum number to the same size as in case 2. This is correct and both values are printed. Fourth case is the same as second case. None of those above are bugs. The fact that there are two separate limits applied on the RPC layer and on the qemu level can't be avoided. If we'd decrease the RPC limit, hypervisors which don't have any internal limitation would not be able to use the full potential. Moving back to ON_QA. Thx Peter, Verified on: qemu-kvm-rhev-2.12.0-13.el7.x86_64 libvirt-4.5.0-9.virtcov.el7.x86_64 [root@ibm-x3250m5-04 ~]# virsh domblklist vm2 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/vm2-1.qcow2 sdb /var/lib/libvirt/images/test.qcow2 [root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 17592186044410 error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s) [root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022208 error: invalid argument: bandwidth must be less than '9223372036854775807' bytes/s (8796093022207 MiB/s) [root@ibm-x3250m5-04 ~]# virsh blockcopy vm2 sdb /tmp/test.copy --transient-job --verbose --wait 8796093022207 Block Copy: [100 %] Now in mirroring phase (In reply to Peter Krempa from comment #6) > Note that all cases you've found are not relevant to the original bug report. > > So the difference in the values is caused by the logic in virsh. If you add > '--transient-job' the new API is invoked which uses typed parametes for > arguments. > > The typed parameter is an unsigned long long and is specified in bytes. In > the first case above the error message is reported when the user selected > value cannot be transported in the RPC protocol. This is due to the fact > that the user uses MiB/s which needs to be converted to bytes/s. > > In second case the error is correct, but it's missing the unit. The value is > not misleading though since the default is in MiB/s. In case of the second > API we don't do any checking since the value is transported in MiB/s, so the > qemu driver can apply the limit. > > In the third case the error is reported from the qemu driver which has a > stricter limitation to signed long long type to pass it to qemu. This > further limits the maximum number to the same size as in case 2. This is > correct and both values are printed. > > Fourth case is the same as second case. > > None of those above are bugs. The fact that there are two separate limits > applied on the RPC layer and on the qemu level can't be avoided. If we'd > decrease the RPC limit, hypervisors which don't have any internal limitation > would not be able to use the full potential. > > Moving back to ON_QA. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Description of problem: Blockcopy failed when using --bandwidth option with the values in the normal range Version-Release number of selected component (if applicable): libvirt-3.9.0-7.el7.x86_64 qemu-kvm-rhev-2.10.0-15.el7.x86_64 How reproducible: 100% Steps to reproduce: 1. Prepare an transient guest with 'file' disk type. # virsh dumpxml rhel7 | grep disk -a12 ... <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/rhel7.img'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> ... 2. Do blockcopy with --bandwidth option that the value is out of value. # virsh blockcopy rhel7 vda /tmp/test.copy --transient-job --verbose --wait 175921860444158 error: bandwidth must be less than 17592186044415 3. Do blockcopy with --bandwidth option with the value in the normal range. # virsh blockcopy lmn vda /tmp/test.copy --transient-job --verbose --wait 17592186044410 error: internal error: argument key 'speed' must not be negative Actual results: As above step 3. Expected results: The guest can do blockcopy success.