Bug 1443493
Summary: | Improve live block device job status reporting via virDomainBlockJobInfo() | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Nir Soffer <nsoffer> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Han Han <hhan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | amureini, chhu, dyuan, jdenemar, kchamart, libvirt-maint, lmen, mprivozn, nsoffer, pkrempa, rbalakri, xuzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-3.2.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1372613 | Environment: | |||||
Last Closed: | 2017-08-02 00:05:54 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1372613 | ||||||
Bug Blocks: | 1442266 | ||||||
Attachments: |
|
Description
Nir Soffer
2017-04-19 11:02:10 UTC
This will allow RHV to handle correctly blockJobInfo returning cur=0 and end=0, currently RHV assumes that a block job was completed, and invoke blockJobAbort too early. I discussed this bug with Michal on irc, and he thinks we can backport the fix to 7.3. Hi Michal, I tried to reproduce this bug on libvirt-2.0.0-10.el7_3.9.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64, but failed. My code blockjob.c ... #include <stdio.h> #include <stdlib.h> #include <libvirt/libvirt.h> int main(int argc, char *argv[]) { virConnectPtr conn; virDomainPtr dom; virDomainBlockJobInfo info; const char *domName =argv[1]; const char *disk = argv[2]; conn = virConnectOpen("qemu:///system"); if (conn == NULL) { fprintf(stderr, "Failed to open connection to qemu:///system\n"); return 1; } dom = virDomainLookupByName(conn, domName); virDomainBlockRebase(dom, disk, argv[3], 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT); while (1) { virDomainGetBlockJobInfo(dom, disk, &info, 0); printf("blockjob info: bw %lu, cur %llu, end %llu\n", info.bandwidth, info.cur, info.end); if (info.cur == info.end) break; sleep(1); } virDomainBlockJobAbort(dom, disk, 0); printf("The end"); virConnectClose(conn); return 0; } ... Compile the code: # gcc blockjob.c -o blockjob `pkg-config libvirt --libs` -g Start the VM: # virsh list Id Name State ---------------------------------------------------- 7 V running # virsh domblklist V Target Source ------------------------------------------------ vda /exports/nfs.s1 vdb /exports/vdb # qemu-img info /exports/nfs.s1 image: /exports/nfs.s1 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.1M cluster_size: 65536 backing file: /exports/nfs.1497518937 backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img create -f qcow2 /exports/nfs.s3 10G Formatting '/exports/nfs.s3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 Then run the code: # /blockjob V vda /exports/nfs.s3 blockjob info: bw 0, cur 0, end 786432 blockjob info: bw 0, cur 786432, end 786432 Check /exports/nfs.s3: # qemu-img info /exports/nfs.s3 image: /exports/nfs.s3 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 1.1M cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false I didn't get any event like 'cur == end == 0' or BlockRebase failed or BlockJobAbort failed. Could you give some ideas abort hitting the corner case? (In reply to Han Han from comment #4) > I didn't get any event like 'cur == end == 0' or BlockRebase failed or > BlockJobAbort failed. > Could you give some ideas abort hitting the corner case? While your program is running you need to abort the job from a different terminal. The JobAbort() call you have in your code is called only after the job has finished. Reproduce it on libvirt-2.0.0-10.el7_3.9.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64 To hit the corner case, we should do BlockGetJobInfo() and BlockJobAbort() in parallel and check if cur==end==0 . 1. Compile the test code # gcc blockjob.c -o blockjob `pkg-config libvirt --libs` -g 2. Prepare a running VM, and the VM has big disk size to hit the corner case more easily. # virsh list Id Name State ---------------------------------------------------- 12 V running # virsh domblklist V Target Source ------------------------------------------------ vda /exports/nfs.qcow2 # qemu-img info /exports/nfs.qcow2 image: /exports/nfs.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 8.0G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false 3. Create a image the same virtual size as guest image's for reusing . # qemu-img create -f qcow2 /exports/nfs.s3 10G Formatting '/exports/nfs.s3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 4. Run the testing progrem as this format: ./blockjob DOM_NAME TARGET_DISK REUSE_DISK # ./blockjob V vda /exports/nfs.s3 blockjob info: bw 0, cur 0, end 0 Abort start Abort finished The rebase job started but not ready with cur==end==0. Corner case hits. Verify it on libvirt-3.2.0-10.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64: Redo step 1~4 then run testing program # ./blockjob test vda /tmp/aaa blockjob info: bw 0, cur 0, end 1 blockjob info: bw 0, cur 0, end 6115360768 Abort start blockjob info: bw 0, cur 6356992, end 6115360768 Abort finished blockjob info: bw 0, cur 0, end 0 As the patch says, when rebase job started but not ready, BlockJobGetInfo should get cur==0,end==1. Expected result. Bug fixed. Created attachment 1288237 [details]
Testing program src code
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |