RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 811683 - deal with change from RHEL 6.2 sync block_job_cancel to RHEL 6.3 async block-job-cancel
Summary: deal with change from RHEL 6.2 sync block_job_cancel to RHEL 6.3 async block-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 6.2
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 582475 812085 813953 814080
Blocks: 525307 580954 638506 638508 638509 748534 756082 769496 786141 799055 802284 806280 806432 815791 830861 831532 835344 835345 835722 865384
TreeView+ depends on / blocked
 
Reported: 2012-04-11 17:33 UTC by Eric Blake
Modified: 2013-01-10 00:51 UTC (History)
32 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 582475
: 815791 (view as bug list)
Environment:
Last Closed: 2012-06-20 06:54:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0748 0 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2012-06-19 19:31:38 UTC

Comment 1 Eric Blake 2012-04-11 17:42:01 UTC
The initial patches were done under the auspices of bug 638506, but getting async block_job_cancel to work correctly with libvirt is important whether or not we also get live block migration working.

Comment 8 Eric Blake 2012-04-18 17:05:08 UTC
Upstream raised another issue where a semantic difference would be desirable:

https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02273.html

If upstream indeed goes with block-job-set-speed being callable at any time, and not just when a block job is active, then this semantic change from block_job_set_speed would be another thing that libvirt would like to differentiate on based on the spelling of the monitor command.  I'm not sure whether to clone this into another libvirt BZ, but depending on whether qemu 1.1 gets the semantics fixed in time, this is something that libvirt should be aware of.  (I suppose that libvirt could blindly try to set speed in advance, and fall back to setting it after the job, as a mitigation if we cannot rely on the spelling of the command to tell the difference).

Comment 9 Wayne Sun 2012-04-19 07:21:22 UTC
pkgs:
libvirt-0.9.10-13.el6.x86_64
qemu-kvm-0.12.1.2-2.275.el6.x86_64
kernel-2.6.32-251.el6.x86_64

prepare a running domain with qed img
# qemu-img info /var/lib/libvirt/images/libvirt_test_api 
image: /var/lib/libvirt/images/libvirt_test_api
file format: qed
virtual size: 10G (10737418240 bytes)
disk size: 1.2G
cluster_size: 65536
You have new mail in /var/spool/mail/root

1. run blockpull
# virsh blockpull libvirt_test_api vda 1
Block Pull started

2. check blockjob info
# virsh blockjob libvirt_test_api vda --info
Block Pull: [ 14 %]    Bandwidth limit: 1 MB/s

3. abort block job with --async
# virsh blockjob libvirt_test_api vda --abort --async
immediately returned

check in libvirtd.log:
2012-04-19 06:58:53.632+0000: 10291: debug : virJSONValueToString:1102 : result={"execute":"block_job_cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-8"}
2012-04-19 06:58:53.632+0000: 10291: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=8 events=15
2012-04-19 06:58:53.632+0000: 10291: debug : virEventPollInterruptLocked:706 : Interrupting
2012-04-19 06:58:53.632+0000: 10291: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7f59b0000cf0 msg={"execute":"block_job_cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-8"}^M
 fd=-1

with --async libvirt still pass block_job_cancel command to qemu, this is not desired, should send block-job-cancel, right? (p.s. the test shows no difference with only use --abort)

Comment 10 Eric Blake 2012-04-19 14:33:43 UTC
(In reply to comment #9)
> pkgs:
> libvirt-0.9.10-13.el6.x86_64
> qemu-kvm-0.12.1.2-2.275.el6.x86_64
> kernel-2.6.32-251.el6.x86_64

There's your problem. According to bug 812085, RHEV didn't supply the name block-job-cancel until qemu-kvm-rhev-0.12.1.2-2.278.el6

> 
> with --async libvirt still pass block_job_cancel command to qemu, this is not
> desired, should send block-job-cancel, right? (p.s. the test shows no
> difference with only use --abort)

You are seeing that libvirt _correctly_ detected the spelling provided by the build; however, as qemu-kvm*.275 has an asynchronous cancel but only the older synchronous name, you would also notice that libvirt ends up emitting double events from qemu (one synthesized by libvirt, since libvirt thinks qemu-kvm won't emit the event due to the wrong spelling, and one directly from the qemu event).

Comment 11 Eric Blake 2012-04-19 21:20:47 UTC
Also, Alex Jia found a bug detected by valgrind in 'virsh blockpull --wait ...', so I'm moving this back to ASSIGNED.

Comment 12 Wayne Sun 2012-04-20 06:32:06 UTC
(In reply to comment #10)

> 
> There's your problem. According to bug 812085, RHEV didn't supply the name
> block-job-cancel until qemu-kvm-rhev-0.12.1.2-2.278.el6
> 
My fault, after updated to qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64, retest and check the log:

2012-04-20 03:03:39.448+0000: 2157: debug : virJSONValueToString:1102 : result={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-10"}
2012-04-20 03:03:39.448+0000: 2157: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=16 events=15
2012-04-20 03:03:39.448+0000: 2157: debug : virEventPollInterruptLocked:706 : Interrupting
2012-04-20 03:03:39.448+0000: 2157: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7ff57c007aa0 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-10"}^M
 fd=-1 

It working as expected.
> > 
> > with --async libvirt still pass block_job_cancel command to qemu, this is not
> > desired, should send block-job-cancel, right? (p.s. the test shows no
> > difference with only use --abort)
> 
> You are seeing that libvirt _correctly_ detected the spelling provided by the
> build; however, as qemu-kvm*.275 has an asynchronous cancel but only the older
> synchronous name, you would also notice that libvirt ends up emitting double
> events from qemu (one synthesized by libvirt, since libvirt thinks qemu-kvm
> won't emit the event due to the wrong spelling, and one directly from the qemu
> event).
Thanks for explain this.

-----
Other steps:
1. test with --wait with blockpull
# valgrind -v virsh blockpull libvirt_test_api vda --wait

2. partial blockpull 
# virsh blockpull libvirt_test_api vda --base /var/lib/libvirt/images/qed1.img 
Block Pull started

Comment 14 Eric Blake 2012-04-24 03:38:42 UTC
bug 813593 may require one further patch for this BZ

Comment 15 Eric Blake 2012-04-24 04:20:20 UTC
*** Bug 814080 has been marked as a duplicate of this bug. ***

Comment 16 Eric Blake 2012-04-24 04:22:16 UTC
back to ASSIGNED while we wait on 813593; the memory leak has been split off to bug 814080

Comment 17 EricLee 2012-04-24 08:39:24 UTC
Does the new version only support "block-job-cancel", but not support "block_job_cancel"? 
Because I have tested without '--async' option, and got the same result as with '--async'.

The versions I used:
# rpm -qa libvirt qemu-kvm-rhev kernel
kernel-2.6.32-262.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64
libvirt-0.9.10-13.el6.x86_64

# virsh blockpull qed /var/lib/libvirt/images/qed_backing.img 1
Block Pull started
# virsh blockjob qed /var/lib/libvirt/images/qed_backing.img --abort
returned immediately

And the log in libvirtd.log:
2012-04-24 08:16:00.380+0000: 1393: debug : qemuDomainObjBeginJobInternal:753 : Starting job: modify (async=none)
2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7f6f70002960 refs=3
2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorBlockJob:2745 : mon=0x7f6f70002960, device=drive-virtio-disk0, base=(null), bandwidth=0, info=(nil), mode=0, async=1
2012-04-24 08:16:00.416+0000: 1393: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7f6f70002960 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-28"}^M
 fd=-1
2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7f6f70002960 refs=4
2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7f6f70002960 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-28"}^M
 len=94 ret=94 errno=11
2012-04-24 08:16:00.416+0000: 1392: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f6f70002960 refs=3

Comment 18 Paolo Bonzini 2012-04-24 08:43:34 UTC
> Does the new version only support "block-job-cancel", but not support
> "block_job_cancel"? 

Correct.

Comment 19 Eric Blake 2012-04-24 12:10:10 UTC
(In reply to comment #17)
> Does the new version only support "block-job-cancel", but not support
> "block_job_cancel"? 
> Because I have tested without '--async' option, and got the same result as with
> '--async'.

In practice, the window where async matters is very small.  But the general idea is that with RHEL 6.2, libvirt will issue 'block_job_cancel' in isolation, regardless of the --async flag; in RHEL 6.3, libvirt will issue 'block-job-cancel' in isolation with the --async flag, but without the --async flag libvirt will issue 'block-job-cancel' followed by one or more 'query-block-job' in succession (the first query-block-job will be as soon as possible, any additional calls will be in 500ms intervals).

Another thing to test is how many block job events are issued.  With libvirt from RHEL 6.2, you would not get an event on a block job abort from either 6.2 or 6.3 qemu.  With the new libvirt semantics, you should now get exactly one event on block job abort; and that event will either come from qemu (if you are using RHEV 6.3 qemu with block-job-cancel) or be synthesized by libvirt (if you are using RHEL 6.2 qemu with block_job_cancel).  If you ever get double events from libvirt, that's a sign of an impedence mismatch between libvirt and qemu.  Furthermore, if you are testing the RHEL 6.2 interface, remember that you have to test with QED images, as RHEL 6.2 didn't support block pull on qcow2.

Comment 20 Eric Blake 2012-04-24 14:10:15 UTC
Moving this back to ON_QA; any remaining changes that depend on the outcome bug 813953 will be split into a new patch, and we can already test that libvirt targets the names 'block-stream', 'block-job-set-speed', and 'block-job-cancel' when testing against qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64 or newer.

Comment 21 EricLee 2012-04-25 05:42:01 UTC
(In reply to comment #19)
> (In reply to comment #17)
> > Does the new version only support "block-job-cancel", but not support
> > "block_job_cancel"? 
> > Because I have tested without '--async' option, and got the same result as with
> > '--async'.
> 
> In practice, the window where async matters is very small.  But the general
> idea is that with RHEL 6.2, libvirt will issue 'block_job_cancel' in isolation,
> regardless of the --async flag; in RHEL 6.3, libvirt will issue
> 'block-job-cancel' in isolation with the --async flag, but without the --async
> flag libvirt will issue 'block-job-cancel' followed by one or more
> 'query-block-job' in succession (the first query-block-job will be as soon as
> possible, any additional calls will be in 500ms intervals).
> 

Thanks for explaining.

> Another thing to test is how many block job events are issued.  With libvirt
> from RHEL 6.2, you would not get an event on a block job abort from either 6.2
> or 6.3 qemu.  With the new libvirt semantics, you should now get exactly one
> event on block job abort; and that event will either come from qemu (if you are
> using RHEV 6.3 qemu with block-job-cancel) or be synthesized by libvirt (if you
> are using RHEL 6.2 qemu with block_job_cancel).  If you ever get double events
> from libvirt, that's a sign of an impedence mismatch between libvirt and qemu. 
> Furthermore, if you are testing the RHEL 6.2 interface, remember that you have
> to test with QED images, as RHEL 6.2 didn't support block pull on qcow2.

Comment 22 Wayne Sun 2012-04-25 07:38:55 UTC
pkgs:
libvirt-0.9.10-14.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.285.el6.x86_64
kernel-2.6.32-262.el6.x86_64

1. prpare a domain with qed img disk
# virsh dumpxml dom
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qed' cache='none'/>
      <source file='/var/lib/libvirt/images/qed.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
...

2. create img backing file
# qemu-img create -f qed -b /var/lib/libvirt/images/qed.img /var/lib/libvirt/images/qed1.img
Formatting '/var/lib/libvirt/images/qed1.img', fmt=qed size=8388608000 backing_file='/var/lib/libvirt/images/qed.img' cluster_size=0 table_size=0 

3. edit domain disk as using the backing file
# virsh edit dom
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qed' cache='none'/>
      <source file='/var/lib/libvirt/images/qed1.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
...

4. start domain
# virsh start dom

5. check with/without async
5.1 check without --async
# virsh blockpull dom vda 1
Block Pull started
# virsh blockjob dom vda --abort

check log:
2012-04-25 05:14:49.601+0000: 2594: debug : virDomainBlockJobAbort:17883 : dom=0x7fe960054e20, (VM: name=dom, uuid=e0027b60-e4ed-8f4e-ee17-27a3159cd8f3), disk=vda, flags=0
2012-04-25 05:14:49.601+0000: 2594: debug : qemuDomainObjBeginJobInternal:753 : Starting job: modify (async=none)
2012-04-25 05:14:49.681+0000: 2594: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=3
2012-04-25 05:14:49.681+0000: 2594: debug : qemuMonitorBlockJob:2782 : mon=0x7fe960002e30, device=drive-virtio-disk0, base=(null), bandwidth=0, info=(nil), mode=0, async=1
...

2012-04-25 05:14:49.690+0000: 2594: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7fe960002e30 msg={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-9"}^M
 fd=-1

...
2012-04-25 05:14:49.690+0000: 2592: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=4
2012-04-25 05:14:49.691+0000: 2592: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7fe960002e30 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-9"}^M
 len=93 ret=93 errno=11
...

2012-04-25 05:14:49.717+0000: 2594: debug : virJSONValueToString:1105 : result={"execute":"query-block-jobs","id":"libvirt-10"}
2012-04-25 05:14:49.717+0000: 2594: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=10 events=15
2012-04-25 05:14:49.717+0000: 2594: debug : virEventPollInterruptLocked:706 : Interrupting
2012-04-25 05:14:49.717+0000: 2594: debug : qemuMonitorSend:823 : QEMU_MONITOR_SEND_MSG: mon=0x7fe960002e30 msg={"execute":"query-block-jobs","id":"libvirt-10"}^M
 fd=-1

...

2012-04-25 05:14:49.718+0000: 2592: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7fe960002e30 refs=4
2012-04-25 05:14:49.718+0000: 2592: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7fe960002e30 buf={"execute":"query-block-jobs","id":"libvirt-10"}^M
 len=50 ret=50 errno=11

One query-block-jobs event followed block-job-cancel, only one event on block job abort.


5.2 check with --async
# virsh blockpull dom vda 1
Block Pull started
# virsh blockjob dom vda --abort --async

check libvirtd.log:

As in comment 12, only one block-job-cancel event found, no query-block-jobs followed.

6. test on 6.2
pkgs:
libvirt-0.9.4-23.el6.x86_64
qemu-kvm-0.12.1.2-2.209.el6.x86_64

# virsh blockpull dom vda 1

# virsh blockjob dom vda --abort

check in libvirtd.log:
14:28:20.856: 4272: debug : virDomainBlockJobAbort:16260 : dom=0x7fdf78009db0, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), path=0x7fdf78008d00, flags=0
14:28:20.856: 4272: debug : qemuMonitorBlockJob:2562 : mon=0x7fdf74005de0, device=0x7fdf78001940, bandwidth=0, info=(nil), mode=0
14:28:20.871: 4272: debug : virDomainFree:2153 : dom=0x7fdf78009db0, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), 
14:28:20.872: 4267: debug : virConnectClose:1323 : conn=0x7fdf7c0962d0
14:28:20.873: 4267: debug : qemuProcessAutoDestroyRun:3706 : conn=0x7fdf7c0962d0


No event on block job abort found (nor block-job-cancel neither block_job_cancel)

7. test with libvirt on 6.2 but qemu on 6.3
pkgs:
libvirt-0.9.4-23.el6.x86_64
qemu-kvm-0.12.1.2-2.275.el6.x86_64

# virsh blockpull dom vda 1

# virsh blockjob dom vda --abort

check in libvirtd.log:

15:30:26.441: 6367: debug : virDomainBlockJobAbort:16260 : dom=0x7f0d38000b20, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), path=0x7f0d38000970, flags=0
15:30:26.441: 6367: debug : qemuMonitorBlockJob:2562 : mon=0x7f0d40000ce0, device=0x7f0d380009b0, bandwidth=0, info=(nil), mode=0
15:30:26.442: 6367: debug : virDomainFree:2153 : dom=0x7f0d38000b20, (VM: name=dom, uuid=5b5fee7b-5e4d-ff9c-6a54-df6a51f75572), 
15:30:26.443: 6363: debug : virConnectClose:1323 : conn=0x7f0d30000a90
15:30:26.444: 6363: debug : qemuProcessAutoDestroyRun:3706 : conn=0x7f0d30000a90

Also no event on block job abort found (nor block-job-cancel neither block_job_cancel)


It works as expected. 
So, are these enough to verify this bug?

Comment 23 Eric Blake 2012-04-25 13:57:31 UTC
Yes, I think you've verified it.

Comment 24 Wayne Sun 2012-04-26 02:10:32 UTC
Thanks, mark verified.

Comment 26 errata-xmlrpc 2012-06-20 06:54:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html


Note You need to log in before you can comment on or make changes to this bug.