Bug 1443493
| Summary: | Improve live block device job status reporting via virDomainBlockJobInfo() | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nir Soffer <nsoffer> | ||||
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Han Han <hhan> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.3 | CC: | amureini, chhu, dyuan, jdenemar, kchamart, libvirt-maint, lmen, mprivozn, nsoffer, pkrempa, rbalakri, xuzhang | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-3.2.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1372613 | Environment: | |||||
| Last Closed: | 2017-08-02 00:05:54 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1372613 | ||||||
| Bug Blocks: | 1442266 | ||||||
| Attachments: |
|
||||||
|
Description
Nir Soffer
2017-04-19 11:02:10 UTC
This will allow RHV to handle correctly blockJobInfo returning cur=0 and end=0, currently RHV assumes that a block job was completed, and invoke blockJobAbort too early. I discussed this bug with Michal on irc, and he thinks we can backport the fix to 7.3. Hi Michal, I tried to reproduce this bug on libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64, but failed.
My code blockjob.c
...
#include <stdio.h>
#include <stdlib.h>
#include <libvirt/libvirt.h>
int main(int argc, char *argv[])
{
virConnectPtr conn;
virDomainPtr dom;
virDomainBlockJobInfo info;
const char *domName =argv[1];
const char *disk = argv[2];
conn = virConnectOpen("qemu:///system");
if (conn == NULL) {
fprintf(stderr, "Failed to open connection to qemu:///system\n");
return 1;
}
dom = virDomainLookupByName(conn, domName);
virDomainBlockRebase(dom, disk, argv[3], 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT);
while (1) {
virDomainGetBlockJobInfo(dom, disk, &info, 0);
printf("blockjob info: bw %lu, cur %llu, end %llu\n", info.bandwidth,
info.cur, info.end);
if (info.cur == info.end)
break;
sleep(1);
}
virDomainBlockJobAbort(dom, disk, 0);
printf("The end");
virConnectClose(conn);
return 0;
}
...
Compile the code:
# gcc blockjob.c -o blockjob `pkg-config libvirt --libs` -g
Start the VM:
# virsh list
Id Name State
----------------------------------------------------
7 V running
# virsh domblklist V
Target Source
------------------------------------------------
vda /exports/nfs.s1
vdb /exports/vdb
# qemu-img info /exports/nfs.s1
image: /exports/nfs.s1
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.1M
cluster_size: 65536
backing file: /exports/nfs.1497518937
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
# qemu-img create -f qcow2 /exports/nfs.s3 10G
Formatting '/exports/nfs.s3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
Then run the code:
# /blockjob V vda /exports/nfs.s3
blockjob info: bw 0, cur 0, end 786432
blockjob info: bw 0, cur 786432, end 786432
Check /exports/nfs.s3:
# qemu-img info /exports/nfs.s3
image: /exports/nfs.s3
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 1.1M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
I didn't get any event like 'cur == end == 0' or BlockRebase failed or BlockJobAbort failed.
Could you give some ideas abort hitting the corner case?
(In reply to Han Han from comment #4) > I didn't get any event like 'cur == end == 0' or BlockRebase failed or > BlockJobAbort failed. > Could you give some ideas abort hitting the corner case? While your program is running you need to abort the job from a different terminal. The JobAbort() call you have in your code is called only after the job has finished. Reproduce it on libvirt-2.0.0-10.el7_3.9.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64
To hit the corner case, we should do BlockGetJobInfo() and BlockJobAbort() in parallel and check if cur==end==0 .
1. Compile the test code
# gcc blockjob.c -o blockjob `pkg-config libvirt --libs` -g
2. Prepare a running VM, and the VM has big disk size to hit the corner case more easily.
# virsh list
Id Name State
----------------------------------------------------
12 V running
# virsh domblklist V
Target Source
------------------------------------------------
vda /exports/nfs.qcow2
# qemu-img info /exports/nfs.qcow2
image: /exports/nfs.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 8.0G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
3. Create a image the same virtual size as guest image's for reusing .
# qemu-img create -f qcow2 /exports/nfs.s3 10G
Formatting '/exports/nfs.s3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
4. Run the testing progrem as this format:
./blockjob DOM_NAME TARGET_DISK REUSE_DISK
# ./blockjob V vda /exports/nfs.s3
blockjob info: bw 0, cur 0, end 0
Abort start
Abort finished
The rebase job started but not ready with cur==end==0. Corner case hits.
Verify it on libvirt-3.2.0-10.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64:
Redo step 1~4 then run testing program
# ./blockjob test vda /tmp/aaa
blockjob info: bw 0, cur 0, end 1
blockjob info: bw 0, cur 0, end 6115360768
Abort start
blockjob info: bw 0, cur 6356992, end 6115360768
Abort finished
blockjob info: bw 0, cur 0, end 0
As the patch says, when rebase job started but not ready, BlockJobGetInfo should get cur==0,end==1.
Expected result. Bug fixed.
Created attachment 1288237 [details]
Testing program src code
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |