Bug 811683
Summary: | deal with change from RHEL 6.2 sync block_job_cancel to RHEL 6.3 async block-job-cancel | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Eric Blake <eblake> | |
Component: | libvirt | Assignee: | Eric Blake <eblake> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 6.3 | CC: | abaron, acathrow, ajia, areis, bazulay, berrange, bili, bsarathy, bugproxy, djuran, dyasny, dyuan, eblake, fosborne, gcosta, gsun, iheim, juzhang, kirbyzhou, michen, mkenneth, mzhan, pbonzini, rwu, shu, syeghiay, tburke, veillard, virt-maint, weizhan, whuang, yupzhang | |
Target Milestone: | rc | |||
Target Release: | 6.2 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | 582475 | |||
: | 815791 (view as bug list) | Environment: | ||
Last Closed: | 2012-06-20 06:54:20 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 582475, 812085, 813953, 814080 | |||
Bug Blocks: | 525307, 580954, 638506, 638508, 638509, 748534, 756082, 769496, 786141, 799055, 802284, 806280, 806432, 815791, 830861, 831532, 835344, 835345, 835722, 865384 |
Comment 1
Eric Blake
2012-04-11 17:42:01 UTC
Upstream raised another issue where a semantic difference would be desirable: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02273.html If upstream indeed goes with block-job-set-speed being callable at any time, and not just when a block job is active, then this semantic change from block_job_set_speed would be another thing that libvirt would like to differentiate on based on the spelling of the monitor command. I'm not sure whether to clone this into another libvirt BZ, but depending on whether qemu 1.1 gets the semantics fixed in time, this is something that libvirt should be aware of. (I suppose that libvirt could blindly try to set speed in advance, and fall back to setting it after the job, as a mitigation if we cannot rely on the spelling of the command to tell the difference). pkgs: libvirt-0.9.10-13.el6.x86_64 qemu-kvm-0.12.1.2-2.275.el6.x86_64 kernel-2.6.32-251.el6.x86_64 prepare a running domain with qed img # qemu-img info /var/lib/libvirt/images/libvirt_test_api image: /var/lib/libvirt/images/libvirt_test_api file format: qed virtual size: 10G (10737418240 bytes) disk size: 1.2G cluster_size: 65536 You have new mail in /var/spool/mail/root 1. run blockpull # virsh blockpull libvirt_test_api vda 1 Block Pull started 2. check blockjob info # virsh blockjob libvirt_test_api vda --info Block Pull: [ 14 %] Bandwidth limit: 1 MB/s 3. abort block job with --async # virsh blockjob libvirt_test_api vda --abort --async immediately returned check in libvirtd.log: 2012-04-19 06:58:53.632+0000: 10291: debug : virJSONValueToString:1102 : result={"execute":"block_job_cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-8"} 2012-04-19 06:58:53.632+0000: 10291: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=8 events=15 2012-04-19 06:58:53.632+0000: 10291: debug : virEventPollInterruptLocked:706 : Interrupting 2012-04-19 06:58:53.632+0000: 10291: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7f59b0000cf0 msg={"execute":"block_job_cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-8"}^M fd=-1 with --async libvirt still pass block_job_cancel command to qemu, this is not desired, should send block-job-cancel, right? (p.s. the test shows no difference with only use --abort) (In reply to comment #9) > pkgs: > libvirt-0.9.10-13.el6.x86_64 > qemu-kvm-0.12.1.2-2.275.el6.x86_64 > kernel-2.6.32-251.el6.x86_64 There's your problem. According to bug 812085, RHEV didn't supply the name block-job-cancel until qemu-kvm-rhev-0.12.1.2-2.278.el6 > > with --async libvirt still pass block_job_cancel command to qemu, this is not > desired, should send block-job-cancel, right? (p.s. the test shows no > difference with only use --abort) You are seeing that libvirt _correctly_ detected the spelling provided by the build; however, as qemu-kvm*.275 has an asynchronous cancel but only the older synchronous name, you would also notice that libvirt ends up emitting double events from qemu (one synthesized by libvirt, since libvirt thinks qemu-kvm won't emit the event due to the wrong spelling, and one directly from the qemu event). Also, Alex Jia found a bug detected by valgrind in 'virsh blockpull --wait ...', so I'm moving this back to ASSIGNED. (In reply to comment #10) > > There's your problem. According to bug 812085, RHEV didn't supply the name > block-job-cancel until qemu-kvm-rhev-0.12.1.2-2.278.el6 > My fault, after updated to qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64, retest and check the log: 2012-04-20 03:03:39.448+0000: 2157: debug : virJSONValueToString:1102 : result={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-10"} 2012-04-20 03:03:39.448+0000: 2157: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=16 events=15 2012-04-20 03:03:39.448+0000: 2157: debug : virEventPollInterruptLocked:706 : Interrupting 2012-04-20 03:03:39.448+0000: 2157: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7ff57c007aa0 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-10"}^M fd=-1 It working as expected. > > > > with --async libvirt still pass block_job_cancel command to qemu, this is not > > desired, should send block-job-cancel, right? (p.s. the test shows no > > difference with only use --abort) > > You are seeing that libvirt _correctly_ detected the spelling provided by the > build; however, as qemu-kvm*.275 has an asynchronous cancel but only the older > synchronous name, you would also notice that libvirt ends up emitting double > events from qemu (one synthesized by libvirt, since libvirt thinks qemu-kvm > won't emit the event due to the wrong spelling, and one directly from the qemu > event). Thanks for explain this. ----- Other steps: 1. test with --wait with blockpull # valgrind -v virsh blockpull libvirt_test_api vda --wait 2. partial blockpull # virsh blockpull libvirt_test_api vda --base /var/lib/libvirt/images/qed1.img Block Pull started bug 813593 may require one further patch for this BZ *** Bug 814080 has been marked as a duplicate of this bug. *** back to ASSIGNED while we wait on 813593; the memory leak has been split off to bug 814080 Does the new version only support "block-job-cancel", but not support "block_job_cancel"? Because I have tested without '--async' option, and got the same result as with '--async'. The versions I used: # rpm -qa libvirt qemu-kvm-rhev kernel kernel-2.6.32-262.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64 libvirt-0.9.10-13.el6.x86_64 # virsh blockpull qed /var/lib/libvirt/images/qed_backing.img 1 Block Pull started # virsh blockjob qed /var/lib/libvirt/images/qed_backing.img --abort returned immediately And the log in libvirtd.log: 2012-04-24 08:16:00.380+0000: 1393: debug : qemuDomainObjBeginJobInternal:753 : Starting job: modify (async=none) 2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7f6f70002960 refs=3 2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorBlockJob:2745 : mon=0x7f6f70002960, device=drive-virtio-disk0, base=(null), bandwidth=0, info=(nil), mode=0, async=1 2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7f6f70002960 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-28"}^M fd=-1 2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7f6f70002960 refs=4 2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7f6f70002960 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-28"}^M len=94 ret=94 errno=11 2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f6f70002960 refs=3 > Does the new version only support "block-job-cancel", but not support
> "block_job_cancel"?
Correct.
(In reply to comment #17) > Does the new version only support "block-job-cancel", but not support > "block_job_cancel"? > Because I have tested without '--async' option, and got the same result as with > '--async'. In practice, the window where async matters is very small. But the general idea is that with RHEL 6.2, libvirt will issue 'block_job_cancel' in isolation, regardless of the --async flag; in RHEL 6.3, libvirt will issue 'block-job-cancel' in isolation with the --async flag, but without the --async flag libvirt will issue 'block-job-cancel' followed by one or more 'query-block-job' in succession (the first query-block-job will be as soon as possible, any additional calls will be in 500ms intervals). Another thing to test is how many block job events are issued. With libvirt from RHEL 6.2, you would not get an event on a block job abort from either 6.2 or 6.3 qemu. With the new libvirt semantics, you should now get exactly one event on block job abort; and that event will either come from qemu (if you are using RHEV 6.3 qemu with block-job-cancel) or be synthesized by libvirt (if you are using RHEL 6.2 qemu with block_job_cancel). If you ever get double events from libvirt, that's a sign of an impedence mismatch between libvirt and qemu. Furthermore, if you are testing the RHEL 6.2 interface, remember that you have to test with QED images, as RHEL 6.2 didn't support block pull on qcow2. Moving this back to ON_QA; any remaining changes that depend on the outcome bug 813953 will be split into a new patch, and we can already test that libvirt targets the names 'block-stream', 'block-job-set-speed', and 'block-job-cancel' when testing against qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64 or newer. (In reply to comment #19) > (In reply to comment #17) > > Does the new version only support "block-job-cancel", but not support > > "block_job_cancel"? > > Because I have tested without '--async' option, and got the same result as with > > '--async'. > > In practice, the window where async matters is very small. But the general > idea is that with RHEL 6.2, libvirt will issue 'block_job_cancel' in isolation, > regardless of the --async flag; in RHEL 6.3, libvirt will issue > 'block-job-cancel' in isolation with the --async flag, but without the --async > flag libvirt will issue 'block-job-cancel' followed by one or more > 'query-block-job' in succession (the first query-block-job will be as soon as > possible, any additional calls will be in 500ms intervals). > Thanks for explaining. > Another thing to test is how many block job events are issued. With libvirt > from RHEL 6.2, you would not get an event on a block job abort from either 6.2 > or 6.3 qemu. With the new libvirt semantics, you should now get exactly one > event on block job abort; and that event will either come from qemu (if you are > using RHEV 6.3 qemu with block-job-cancel) or be synthesized by libvirt (if you > are using RHEL 6.2 qemu with block_job_cancel). If you ever get double events > from libvirt, that's a sign of an impedence mismatch between libvirt and qemu. > Furthermore, if you are testing the RHEL 6.2 interface, remember that you have > to test with QED images, as RHEL 6.2 didn't support block pull on qcow2. pkgs: libvirt-0.9.10-14.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.285.el6.x86_64 kernel-2.6.32-262.el6.x86_64 1. prpare a domain with qed img disk # virsh dumpxml dom ... <disk type='file' device='disk'> <driver name='qemu' type='qed' cache='none'/> <source file='/var/lib/libvirt/images/qed.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> ... 2. create img backing file # qemu-img create -f qed -b /var/lib/libvirt/images/qed.img /var/lib/libvirt/images/qed1.img Formatting '/var/lib/libvirt/images/qed1.img', fmt=qed size=8388608000 backing_file='/var/lib/libvirt/images/qed.img' cluster_size=0 table_size=0 3. edit domain disk as using the backing file # virsh edit dom ... <disk type='file' device='disk'> <driver name='qemu' type='qed' cache='none'/> <source file='/var/lib/libvirt/images/qed1.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> ... 4. start domain # virsh start dom 5. check with/without async 5.1 check without --async # virsh blockpull dom vda 1 Block Pull started # virsh blockjob dom vda --abort check log: 2012-04-25 05:14:49.601+0000: 2594: debug : virDomainBlockJobAbort:17883 : dom=0x7fe960054e20, (VM: name=dom, uuid=e0027b60-e4ed-8f4e-ee17-27a3159cd8f3), disk=vda, flags=0 2012-04-25 05:14:49.601+0000: 2594: debug : qemuDomainObjBeginJobInternal:753 : Starting job: modify (async=none) 2012-04-25 05:14:49.681+0000: 2594: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=3 2012-04-25 05:14:49.681+0000: 2594: debug : qemuMonitorBlockJob:2782 : mon=0x7fe960002e30, device=drive-virtio-disk0, base=(null), bandwidth=0, info=(nil), mode=0, async=1 ... 2012-04-25 05:14:49.690+0000: 2594: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7fe960002e30 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-9"}^M fd=-1 ... 2012-04-25 05:14:49.690+0000: 2592: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=4 2012-04-25 05:14:49.691+0000: 2592: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7fe960002e30 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-9"}^M len=93 ret=93 errno=11 ... 2012-04-25 05:14:49.717+0000: 2594: debug : virJSONValueToString:1105 : result={"execute":"query-block-jobs","id":"libvirt-10"} 2012-04-25 05:14:49.717+0000: 2594: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=10 events=15 2012-04-25 05:14:49.717+0000: 2594: debug : virEventPollInterruptLocked:706 : Interrupting 2012-04-25 05:14:49.717+0000: 2594: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7fe960002e30 msg={"execute":"query-block-jobs","id":"libvirt-10"}^M fd=-1 ... 2012-04-25 05:14:49.718+0000: 2592: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=4 2012-04-25 05:14:49.718+0000: 2592: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7fe960002e30 buf={"execute":"query-block-jobs","id":"libvirt-10"}^M len=50 ret=50 errno=11 One query-block-jobs event followed block-job-cancel, only one event on block job abort. 5.2 check with --async # virsh blockpull dom vda 1 Block Pull started # virsh blockjob dom vda --abort --async check libvirtd.log: As in comment 12, only one block-job-cancel event found, no query-block-jobs followed. 6. test on 6.2 pkgs: libvirt-0.9.4-23.el6.x86_64 qemu-kvm-0.12.1.2-2.209.el6.x86_64 # virsh blockpull dom vda 1 # virsh blockjob dom vda --abort check in libvirtd.log: 14:28:20.856: 4272: debug : virDomainBlockJobAbort:16260 : dom=0x7fdf78009db0, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), path=0x7fdf78008d00, flags=0 14:28:20.856: 4272: debug : qemuMonitorBlockJob:2562 : mon=0x7fdf74005de0, device=0x7fdf78001940, bandwidth=0, info=(nil), mode=0 14:28:20.871: 4272: debug : virDomainFree:2153 : dom=0x7fdf78009db0, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), 14:28:20.872: 4267: debug : virConnectClose:1323 : conn=0x7fdf7c0962d0 14:28:20.873: 4267: debug : qemuProcessAutoDestroyRun:3706 : conn=0x7fdf7c0962d0 No event on block job abort found (nor block-job-cancel neither block_job_cancel) 7. test with libvirt on 6.2 but qemu on 6.3 pkgs: libvirt-0.9.4-23.el6.x86_64 qemu-kvm-0.12.1.2-2.275.el6.x86_64 # virsh blockpull dom vda 1 # virsh blockjob dom vda --abort check in libvirtd.log: 15:30:26.441: 6367: debug : virDomainBlockJobAbort:16260 : dom=0x7f0d38000b20, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), path=0x7f0d38000970, flags=0 15:30:26.441: 6367: debug : qemuMonitorBlockJob:2562 : mon=0x7f0d40000ce0, device=0x7f0d380009b0, bandwidth=0, info=(nil), mode=0 15:30:26.442: 6367: debug : virDomainFree:2153 : dom=0x7f0d38000b20, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), 15:30:26.443: 6363: debug : virConnectClose:1323 : conn=0x7f0d30000a90 15:30:26.444: 6363: debug : qemuProcessAutoDestroyRun:3706 : conn=0x7f0d30000a90 Also no event on block job abort found (nor block-job-cancel neither block_job_cancel) It works as expected. So, are these enough to verify this bug? Yes, I think you've verified it. Thanks, mark verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0748.html |