RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 753169 - QEMU driver mistakenly passes a plain file FD to QEMU for migration
Summary: QEMU driver mistakenly passes a plain file FD to QEMU for migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-11 14:07 UTC by Daniel Berrangé
Modified: 2012-06-20 06:36 UTC (History)
9 users (show)

Fixed In Version: libvirt-0.9.10-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 06:36:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0748 0 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2012-06-19 19:31:38 UTC

Description Daniel Berrangé 2011-11-11 14:07:10 UTC
Description of problem:
The QEMU migration 'fd' protocol, requires that any FD passed to it properly supports non-blocking I/O, ie will report EAGAIN.

File handles for regular files or block devices do not respect this.

In the libvirt O_DIRECT codepath we spawn the libvirt_iohelper and pass QEMU a pipe FD. We should use the I/O helper in the non-O_DIRECT codepath too.

Failure to respect this leads to blocking the QEMU main loop, which causes the migration to become non-live.

While this isn't a problem for save-to-file, this *is* particularly bad for libvirt with the core dump API, since we want the guest to remain live.


Version-Release number of selected component (if applicable):
libvirt-0.9.4-23.el6

How reproducible:
Sometimes

Steps to Reproduce:
It is mostly related to

virsh dump / virsh save

but actually measuring the problem is hard. Probably requires a systemtap script to instrument internals of QEMU to identify long pauses in use of the main loop

Actual results:


Expected results:


Additional info:

Comment 4 Jiri Denemark 2011-11-28 10:27:20 UTC
To test this, you need to run
# virsh dump --live guest guest.dump
and check if you can still interact with the guest while this virsh dump operation is running. Checking domain's state using virsh or virt-manager. By interacting with the guest, I mean either, for example, talking to it via virsh console or through ssh connection or even just pinging guest's IP.

Comment 5 Jiri Denemark 2012-02-06 14:52:25 UTC
Patches were sent upstream for review: https://www.redhat.com/archives/libvir-list/2012-February/msg00293.html

Comment 6 Jiri Denemark 2012-02-08 10:48:16 UTC
This is now fixed upstream by v0.9.10-rc2-3-gc8683f2:

commit c8683f231dd227da8540f3249d7e332ec7a75ad7
Author: Jiri Denemark <jdenemar>
Date:   Mon Feb 6 14:53:24 2012 +0100

    qemu: Always use iohelper for dumping domain core
    
    Qemu uses non-blocking I/O which doesn't play nice with regular file
    descriptors. We need to pass a pipe to qemu instead, which can easily be
    done using iohelper.

Comment 9 weizhang 2012-02-14 12:27:05 UTC
I test with
qemu-kvm-0.12.1.2-2.225.el6.x86_64
kernel-2.6.32-230.el6.x86_64
libvirt-0.9.10-1.el6.x86_64

It seems the phenomenon of live dump guest is similar as libvirt-0.9.4-23.el6.x86_64, I check with loop virsh domstate during live dump, at first, it is live, but it will pause a small time before the dump finished. It both happened on libvirt-0.9.10-1.el6.x86_64 and libvirt-0.9.4-23.el6.x86_64, but just less time paused than libvirt-0.9.4-23.el6.x86_64, it is that what we want? Or we need to see it always running?

Comment 10 Jiri Denemark 2012-02-14 12:38:44 UTC
The guest will always pause for some time before live dump finishes and the length of this pause depends mainly on what the guest is doing. To be honest I'm not sure what would be the best way of testing this in case more or significantly longer stalls cannot be observed in the guest.

Comment 12 Eric Blake 2012-02-15 16:31:34 UTC
If the dump takes long enough, you can inspect ps and lsof output to see if the destination file descriptor is owned by an instance of libvirt_iohelper, with libvirt_iohelper and qemu sharing a pipe (fixed version), or whether the destination fd is directly owned by qemu (pre-fixed version).

Also, note that the fix for this bug may have caused bug 790668.

Comment 13 weizhang 2012-02-16 03:08:14 UTC
(In reply to comment #12)
> If the dump takes long enough, you can inspect ps and lsof output to see if the
> destination file descriptor is owned by an instance of libvirt_iohelper, with
> libvirt_iohelper and qemu sharing a pipe (fixed version), or whether the
> destination fd is directly owned by qemu (pre-fixed version).


Thanks for Eric's help. I test with old and new version of libvirt and find the difference as Eric said:

1. For libvirt-0.9.4-23.el6.x86_64, when doing live dump, I check the fd with lsof
# lsof /root/mig-rhel.dump 
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF    NODE NAME
libvirtd 465 root   22w   REG    8,1 210058004 1573224 /root/mig-rhel.dump
qemu-kvm 559 qemu   27w   REG    8,1 210058004 1573224 /root/mig-rhel.dump

The destination fd is directly owned by qemu

2. For libvirt-0.9.10-1.el6.x86_64
# lsof /root/mig-rhel.dump 
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
libvirt_i 4210 root    1w   REG    8,1 25165824 262271 /root/mig-rhel.dump
# lsof |grep pipe|grep libvirt_i
libvirt_i  4210      root    0r     FIFO                0,8        0t0     396315 pipe
# lsof |grep pipe|grep qemu-kvm
qemu-kvm   4552      qemu   18w     FIFO                0,8        0t0     396315 pipe
...

The destination fd is owned by libvirt_iohelper, and both libvirt_iohelper and qemu have a pipe with same node

So verify pass on libvirt-0.9.10-1.el6.x86_64

Comment 15 errata-xmlrpc 2012-06-20 06:36:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html


Note You need to log in before you can comment on or make changes to this bug.