892079 – Libvirtd crash when destroyed the windows guest which was excuting s3/s4 operation

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 892079 - Libvirtd crash when destroyed the windows guest which was excuting s3/s4 operation

Summary: Libvirtd crash when destroyed the windows guest which was excuting s3/s4 oper...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	915653 (view as bug list)
Depends On:	890648 1080376
Blocks:	896690 915344
TreeView+	depends on / blocked

Reported:	2013-01-05 05:28 UTC by zhenfeng wang
Modified:	2014-03-25 09:59 UTC (History)
CC List:	13 users (show)
Fixed In Version:	libvirt-0.10.2-19.el6
Doc Type:	Bug Fix
Doc Text:	Previously, libvirtd was unable to execute an s3/s4 operation for a Microsoft Windows guest which ran the guest agent service. Consequently, this resulted in a "domain s4 fail" error message, due to the domain being destroyed. With this update, the guest is destroyed successfully and the libvirtd service no longer crashes.
Clone Of:	890648
Environment:
Last Closed:	2013-11-21 08:36:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
libvirtd crash log (64.06 KB, text/plain) 2013-01-18 10:32 UTC, zhpeng	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1581	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2013-11-21 01:11:35 UTC

Comment 2 zhenfeng wang 2013-01-05 06:44:05 UTC

after step 5 ,i did some further check
# service libvirtd status
libvirtd dead but pid file exists

# virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

# ps aux|grep qemu
root     59813  0.0  0.0 103244   856 pts/5    S+   01:29   0:00 grep qemu

start the libvirtd service ,then check the guest's status,the guest has been destroyed

# service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
# service libvirtd status
libvirtd (pid  60193) is running...

# virsh list  --all
 Id    Name                           State
----------------------------------------------------
 -     win7-32                        shut of

Comment 5 Michal Privoznik 2013-01-09 14:35:32 UTC

Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2013-January/msg00520.html

Comment 6 Michal Privoznik 2013-01-10 10:39:05 UTC

Moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-January/msg00175.html

Comment 9 zhpeng 2013-01-18 10:32:38 UTC

Created attachment 682311 [details]
libvirtd crash log

Comment 10 zhpeng 2013-01-18 10:33:02 UTC

crash log attached.

Comment 11 EricLee 2013-01-18 13:18:32 UTC

I can also still reproduce this bug with libvirt-0.10.2-16.el6.x86_64.

Comment 12 Michal Privoznik 2013-01-21 19:07:38 UTC

Okay guys, I've created a scratch build before claiming I fixed this:

https://brewweb.devel.redhat.com/taskinfo?taskID=5298609

Can you please give it a try?

Comment 14 Michal Privoznik 2013-01-22 09:55:04 UTC

Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2013-January/msg01487.html

Comment 16 Michal Privoznik 2013-01-23 14:37:43 UTC

Since this is targeted for 6.5 now, and I've  just pushed the patch upstream, I am moving this one to POST:

commit d960d06fc06a448f495c465caf06d3d0c74ea587
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jan 21 11:52:44 2013 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Jan 23 15:35:44 2013 +0100

    qemu_agent: Ignore expected EOFs
    
    https://bugzilla.redhat.com/show_bug.cgi?id=892079
    
    One of my previous patches (f2a4e5f176c408) tried to fix crashing
    libvirtd on domain detroy. However, we need to copy pattern from
    qemuProcessHandleMonitorEOF() instead of decrementing reference
    counter. The rationale for this is, if qemu process is dying due
    to domain being destroyed, we obtain EOF on both the monitor and
    agent sockets. However, if the exit is expected, qemuProcessStop
    is called, which cleans both agent and monitor sockets up. We
    want qemuAgentClose() to be called iff the EOF is not expected,
    so we don't leak an FD and memory. Moreover, there could be race
    with qemuProcessHandleMonitorEOF() which could have already
    closed the agent socket, in which case we don't want to do
    anything.

v1.0.1-401-gd960d06

Comment 18 Jiri Denemark 2013-02-26 13:33:15 UTC

*** Bug 915653 has been marked as a duplicate of this bug. ***

Comment 19 Jakub Libosvar 2013-03-13 09:15:50 UTC

Marking with TestBlocker since it fails our Jenkins Jobs testing RHEV

Comment 20 Dave Allan 2013-03-13 14:34:13 UTC

(In reply to comment #19)
> Marking with TestBlocker since it fails our Jenkins Jobs testing RHEV

Fair enough, thank you for the explanation of what's failing.

Comment 22 zhenfeng wang 2013-08-13 10:15:50 UTC

Verify this bug on libvirt-0.10.2-21.el6.x86_64, the following was my verification steps

pkg info
kernel-2.6.32-396.el6.x86_64
libvirt-0.10.2-21.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.386.el6.x86_64
virtio-win-1.6.5-6.el6_4.noarch
steps
1. prepare the test environment as step1 and step2 in coment 0
2 Excute the s3 in the host, quit the command before it was finshed
#  virsh dompmsuspend win7 --target mem
^C

3 Excute the s4 in the host ,quit the command  before it was finished
# virsh dompmsuspend win7 --target disk
^C

4 Then destroy the guest in the host
# virsh destroy win7
Domain win7 destroyed

5.check the libvirtd status
# ps aux|grep libvirtd
root      5251  0.0  0.0 103244   836 pts/0    S+   18:13   0:00 grep libvirtd
root     30067  1.7  0.1 1027604 15896 ?       Sl   16:18   1:58 libvirtd --daemon
# service libvirtd status
libvirtd (pid  30067) is running...

since the libvirtd was not crashed, also i can reproduce this bug in libvirt-0.10.2-13.el6.x86_64, so mark this bug verified

Comment 24 errata-xmlrpc 2013-11-21 08:36:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html

Note You need to log in before you can comment on or make changes to this bug.