Bug 1017194

Summary: Libvirtd crash when destroying linux guest which executed a series of operations about S3 and save /restore
Product: Red Hat Enterprise Linux 6 Reporter: Chris Pelland <cpelland>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, berrange, bili, cwei, dallan, dyuan, eblake, jdenemar, jiahu, jsvarova, mjenner, mzhan, pm-eus, shyu, zhwang
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-18.el6_4.15 Doc Type: Bug Fix
Doc Text:
Some code refactoring to fix another bug left a case in which locks were cleaned up incorrectly. As a consequence, the libvirtd daemon could terminate unexpectedly on certain migrations to file scenarios. With this update, the lock cleanup paths have been fixed and libvirtd no longer crashes when saving a domain to a file.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-13 10:28:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 928661    
Bug Blocks:    

Description Chris Pelland 2013-10-09 12:22:03 UTC
This bug has been copied from bug #928661 and has been proposed
to be backported to 6.4 z-stream (EUS).

Comment 7 zhenfeng wang 2013-11-07 08:22:09 UTC
Verify this bug on libvirt libvirt-0.10.2-18.el6_4.15.x86_64, The following was the verify steps
pkginfo
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.9.x86_64
libvirt-0.10.2-18.el6_4.15.x86_64
kernel-2.6.32-358.26.1.el6.x86_64
qemu-guest-agent-0.12.1.2-2.355.el6_4.9.x86_64.rpm

steps
1.Prepare a running guest with qemu-ga installed
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 33    rhel64m                        running

guest## service qemu-ga start

# ps aux|grep qemu-ga
root      1523  0.0  0.0   7280   544 ?        Ss   01:03   0:00 /usr/bin/qemu-ga --daemonize --method virtio-serial --path /dev/virtio-ports/org.qemu.guest_agent.0 --logfile /var/log/qemu-ga.log --pidfile /var/run/qemu-ga.pid --blacklist guest-file-open guest-file-close guest-file-read guest-file-write guest-file-seek guest-file-flush

2.Do the S3/S4 on the guest, Currently, it will report error in rhel6.4.z while do the S3 operation in the second time, there was a qemu bug 881585 about this issue, this bug won't fixed in rhel6.4, but the dev offer two workarounds
about this issue. so we can verify this bug with the workrounds. while we test it Without the workrounds, we usually got the following error
 # virsh dompmsuspend rhel64m --target mem
 Domain rhel64m successfully suspended
 # virsh dompmwakeup rhel64m
 Domain rhel64m successfully woken up
 # virsh dompmsuspend rhel64m --target mem
 error: Domain rhel64m could not be suspended
 error: Guest agent is not responding: Guest agent not available for now
 # virsh dompmsuspend rhel64m --target mem
 error: Domain rhel64m could not be suspended
 error: Guest agent is not responding: Guest agent not available for now
 #virsh dompmsuspend rhel64m --target mem
 error: Domain rhel64m could not be suspended
 error: Guest agent is not responding: QEMU guest agent is not available due to an error

3.In order to workrounds the issue in step 2, we can remove pm-utils or setting SELinux to permissive mode in guest, then re-test S3/S4, Both S3 and S4 works well
# virsh dompmsuspend rhel64m --target mem
Domain rhel64m successfully suspended
# virsh dompmwakeup rhel64m
Domain rhel64m successfully woken up

# virsh dompmsuspend rhel64m --target disk
Domain rhel64m successfully suspended
#virsh start rhel64m
Domain rhel64m started

4.Excute a seriers of operations about S3 and save/restore
# virsh dompmsuspend rhel64m --target mem
Domain rhel64m successfully suspended
# virsh dompmwakeup rhel64m
Domain rhel64m successfully woken up
# virsh save rhel64m /tmp/rhel4m

Domain rhel64m saved to /tmp/rhel4m
# virsh restore /tmp/rhel4m 
Domain restored from /tmp/rhel4m

5.After restore from the save file, re-do the S3 with guest, The command will hang there, there was an exsiting bug 890648 about this issue, and the bug was not fixed yet and it not block our verification.
# virsh dompmsuspend rhel64m --target mem
^C
# virsh save rhel64m /tmp/rhel64m 
error: Failed to save domain rhel64m to /tmp/rhel64m
error: Timed out during operation: cannot acquire state change lock

6.Destroy the guest, The geust can be destroyed successfully, and the libvirtd service was not crashed also
#virsh destroy rhel64m
Domain rhel64m destroyed

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel64m                        shut off
# service libvirtd status
libvirtd (pid  9290) is running...
# ps aux|grep libvirtd
root      9290  1.4  0.2 1060492 18804 ?       Sl   Nov06   3:06 libvirtd --daemon

7.Start the guest,re-do the uppper steps, all steps can get the same result with the upper steps, so this bug can be marked verifed

Comment 9 errata-xmlrpc 2013-11-13 10:28:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1517.html