Bug 1055578

Summary: bidirectional VMs migration between 2 hosts fail on VM doesn't exist / fatal error
Product: Red Hat Enterprise Linux 6 Reporter: Chris Pelland <cpelland>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, ahadas, cpelland, dallan, dyuan, honzhang, iheim, istein, jdenemar, jiahu, lpeer, michal.skrivanek, mzhan, pkrempa, pm-eus, Rhev-m-bugs, yeylon, zhwang, zpeng
Target Milestone: rcKeywords: Upstream, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-29.el6_5.3 Doc Type: Bug Fix
Doc Text:
A race condition was possible between a thread starting a virtual machine with a guest agent configured (regular startup or while migrating) and a thread that was killing the VM process (or the process crashing). The race could result into the monitor object being freed by the thread that killed the VM process, which was later accessed by the thread that was attempting to start the VM. This resulted into a crash. The issue was fixed by checking the state of VM after the attempted connection to the guest agent and if the VM exited meanwhile no other operations are attempted.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-28 17:51:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1047659    
Bug Blocks:    

Description Chris Pelland 2014-01-20 14:57:27 UTC
This bug has been copied from bug #1047659 and has been proposed
to be backported to 6.5 z-stream (EUS).

Comment 7 zhenfeng wang 2014-01-22 12:14:23 UTC
I can reproduce this bug with the steps in comment15 in bug 1047659, and the libvirtd didn't crash also the qemu-guest-agent function works as expect after i update the libvirt to the libvirt-0.10.2-29.el6_5.3. so this bug can be marked verified

Reproduce steps
1. rebuild the libvirtd packet as the description in comment 15 in bug 1047659
2. Excute the following command
# virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
[1] 29173
error: Failed to start domain rheltest2
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor

# ps aux|grep libvirtd
root      9765  0.0  0.0 103252   852 pts/0    S+   19:09   0:00 grep libvirtd

Verify steps
1. rebuild the libvirtd packet as the description in comment 15 in bug 1047659
2. Excute the following command, the libvirtd didn't crash also get the expect error
# virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: Failed to start domain rheltest2
error: internal error guest crashed while connecting to the guest agent

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon


3.Start the guest,after the guest start successfully, do S4 with the guest

#virsh start rheltest2
#virsh dompmsuspend rheltest2 --target disk
Domain rheltest2 successfully suspended
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rheltest2                      shut off

4.Excute the step2's command, the libvirtd didn't crash  also get the expect error
#virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: Failed to start domain rheltest2
error: internal error guest crashed while connecting to the guest agent

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon


5.Start the guest, after the guest start completely, do managedsave with the guest

#virsh start rheltest2
#virsh managedsave rheltest2

Domain rheltest2 state saved by libvirt

6.Excute the step2's command, the libvirtd didn't crash  also get an error
#virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: Failed to start domain rheltest2
error: Unable to read from monitor: Connection reset by peer

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon

7.Test the upper scenario without rebuild the libvirt packet, the guest will be in shutoff status and didn't report the error, also the libvirtd didn't crash

#  virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
[1] 14584

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rheltest2                      shut off

8.Start the guest, then check the qemu-guest-agent's function with the following command, all the following command works as expect
#virsh start rheltest2
# virsh setvcpus rheltest2 2 --guest

# virsh vcpucount rheltest2 --guest
2
#virsh dompmsuspend rheltest2 --target mem
#virsh dompmwakeup rheltest2
#virsh dompmsuspend rheltest2 --target disk
#virsh start rheltest2
#virsh shutdown rheltest2 --mode agent
#virsh reboot rheltest2 --mode agent

Comment 9 errata-xmlrpc 2014-01-28 17:51:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0103.html