RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 928672 - Libvirtd crash when destroying linux guest which executed a series of operations about S3 and save /restore
Summary: Libvirtd crash when destroying linux guest which executed a series of operati...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Pavel Hrdina
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 928661
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-28 08:21 UTC by zhenfeng wang
Modified: 2016-04-07 07:17 UTC (History)
12 users (show)

Fixed In Version: libvirt-1.1.1-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of: 928661
Environment:
Last Closed: 2014-06-13 10:13:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the guest's xml (2.47 KB, text/plain)
2013-03-28 08:24 UTC, zhenfeng wang
no flags Details
The gdb info about libvirtd crash (12.43 KB, text/plain)
2013-03-29 02:38 UTC, zhenfeng wang
no flags Details

Description zhenfeng wang 2013-03-28 08:21:28 UTC
+++ This bug was initially created as a clone of Bug #928661 +++

Description of problem:
1.Libvirtd crash when  destroyed the guest which excuted the following operation:
dompmsuspend=>dompmwakeup=>save=>restore=>dompmsuspend(the virsh command will hang here)=>save=>destroy=>libvirtd crashed.
2. dompmwakeup can't work well for RHEL7 guest in RHEL7 host and the guest can't back to the previous status before pmsuspend

Version-Release number of selected component (if applicable):
kernel-3.7.0-0.36.el7.x86_64
libvirt-1.0.3-1.el7.x86_64
qemu-kvm-1.3.0-6.el7.x86_64
How reproducible:
100%

Steps
1.# getenforce
Enforcing

2.Prepare a guest with qemu-ga ENV
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 7     rhel7                       running

3.Start the qemu-ga service in guest
# systemctl start qemu-guest-agent.service

4.Do s3 with the guest,then wakeup it,however,the guest can't back to the previous status before pmsuspend
#virsh dompmsuspend rhel7 --target mem

#virsh dompmwakeup rhel7

5.Save and restore the guest
# virsh save rhel7 /tmp/rhel7.save

#virsh restore /tmp/rhel7.save

6.Do s3 with the guest again, the virsh command will hang here
#virsh dompmsuspend rhel7 --target mem

7.Save the guest again,the guest will fail to save
#virsh save rhel7 /tmp/rhelnew1.save
error: Failed to save domain rhel7 to /tmp/rhel7.save
error: internal error unexpected async job 3

# virsh domjobinfo rhel7
Job type:         None   

8.Destroy the guest,the libvirtd will be crashed here
# virsh destroy rhel7
Domain rhelnew1 destroyed

# virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

# ps aux|grep libvirtd
root      5123  0.0  0.0 103244   828 pts/2    S+   15:39   0:00 grep libvirtd

# service libvirtd status
libvirtd dead but pid file exists

Actual results:
The libvirtd was crashed

Expected results:
1.The virsh command shouldn't hang 
2.libvirtd service shouldn't be crashed
3.when the guest wakeup, the guest can back to the previous status before pmsuspend

Comment 1 zhenfeng wang 2013-03-28 08:24:15 UTC
Created attachment 717507 [details]
the guest's xml

Comment 3 zhenfeng wang 2013-03-29 02:38:59 UTC
Created attachment 717918 [details]
The gdb info about libvirtd crash

Comment 4 Eric Blake 2013-06-06 02:57:29 UTC
libvirtd shouldn't crash, so this needs to be fixed.  That said, it would be nice if RHEL 7 qemu would first be fixed to allow migration without losing S3 state.

Comment 5 zhenfeng wang 2013-07-03 03:22:55 UTC
Hi Eric,
Since this bug has the high level and in the Development Freeze milestone - https://url.corp.redhat.com/41ae300  list for a long time ,so do we have necessary to raise up it blocker, or just downgrade the bug level ? thanks

Comment 6 Eric Blake 2013-07-10 21:28:01 UTC
This is still on my list of things to investigate - we don't want to have crashers.  However, it is taking second seat to some implementation work that I have to complete before the libvirt 1.1.1 upstream release, and we can fix crashes after dev freeze (although I agree that getting the fix in before freeze is better).

Comment 7 Peter Krempa 2013-07-23 14:37:13 UTC
Fixed upstream:

commit 29c2208c045e16f55bbfd25db266c30e90fa3535
Author: Peter Krempa <pkrempa>
Date:   Tue Jul 23 15:35:02 2013 +0200

    qemu: Take error path if acquiring of job fails in qemuDomainSaveInternal
    
    Due to a goto statement missed when refactoring in 2771f8b74c1bf50d1fa
    when acquiring of a domain job failed the error path was not taken. This
    resulted into a crash afterwards as an extra reference was removed from a
    domain object leading to it being freed. An attempt to list the domains
    leaded to a crash of the daemon afterwards.

Comment 8 chentao 2013-08-02 10:20:14 UTC
Hi,peter. when i verify this bug with the latest libvirt packet,  i found there was someting wrong with the  s3/s4, and this block my bug verification, pls help to have a look,is it a new issue?Thanks.

Version-Release number of selected component (if applicable):
kernel-3.10.0-4.el7.kpq2.x86_64
qemu-kvm-1.5.2-2.el7.x86_64
libvirt-1.1.1-1.el7.x86_64

steps:
1.# getenforce
Enforcing

2.Prepare a guest with qemu-ga ENV
# virsh list --all
Id    Name                           State
----------------------------------------------------
7     rhel7                       running
#virsh dumpxml rhel7
......
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
......
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/rhel7.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
......

3.Start the qemu-guest-agent service in guest
# systemctl start qemu-guest-agent.service
# service qemu-guest-agent status
Redirecting to /bin/systemctl status  qemu-guest-agent.service
qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Fri 2013-08-02 05:59:56 EDT; 2min 23s ago
 Main PID: 394 (qemu-ga)
   CGroup: name=systemd:/system/qemu-guest-agent.service
           `-394 /usr/bin/qemu-ga

4.Do s3 with the guest,after wakeup it,it comes an error
#virsh dompmsuspend rhel7 --target mem
error: Domain rhel7 could not be suspended
error: internal error unable to execute QEMU agent command 'guest-suspend-ram': child process has failed to suspend

Comment 10 zhenfeng wang 2013-11-11 10:14:31 UTC
Verify this bug with libvirt-1.1.1-12.el7.x86_64, the issue in comment8 has gone while i update all the packets to the latest. The following was the verify steps

pkg info
kernel-3.10.0-47.el7.x86_64
qemu-kvm-rhev-1.5.3-14.el7.x86_64
libvirt-1.1.1-12.el7.x86_64

steps
1.# getenforce
Enforcing

2.Prepare a guest with qemu-ga ENV
# virsh list --all
Id    Name                           State
----------------------------------------------------
7     rhel7                       running
#virsh dumpxml rhel7
......
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
......
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/rhel7.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
......

3.Start the qemu-guest-agent service in guest
# systemctl start qemu-guest-agent.service
# service qemu-guest-agent status
Redirecting to /bin/systemctl status  qemu-guest-agent.service
qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Fri 2013-08-02 05:59:56 EDT; 2min 23s ago
 Main PID: 394 (qemu-ga)
   CGroup: name=systemd:/system/qemu-guest-agent.service
           `-394 /usr/bin/qemu-ga

4.Do s3 with the guest,then wakeup it,however,the guest can't back to the previous status before pmsuspend
#virsh dompmsuspend rhel7 --target mem

#virsh dompmwakeup rhel7

5.Save and restore the guest
# virsh save rhel7 /tmp/rhel7.save

#virsh restore /tmp/rhel7.save

6.Do s3 with the guest again, the virsh command will hang here, there was an exsiting bug 890648 about this issue in rhel6.4, and the bug was not fixed yet, so clone one 1028927 to the rhel7,  the bug 1028927 didn't block us verify this bug
#virsh dompmsuspend rhel7 --target mem
^C

7.Save the guest again, it will report error, this error was caused by 1028927
# virsh save rhel7 /tmp/rhel7.save 
error: Failed to save domain rhel7 to /tmp/rhel7.save
error: Timed out during operation: cannot acquire state change lock

8.Destroy the guest ,The libvirtd was crashed here
# virsh destroy rhel7
Domain rhel7 destroyed

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel7                          shut off

# ps aux|grep libvirtd
root     19089  1.8  0.2 1126080 21920 ?       Ssl  16:26   2:00 /usr/sbin/libvirtd

# service libvirtd status
Redirecting to /bin/systemctl status  libvirtd.service
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Mon 2013-11-11 16:26:10 CST; 1h 46min ago
 Main PID: 19089 (libvirtd)
   CGroup: /system.slice/libvirtd.service
           ├─ 2418 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default...
           └─19089 /usr/sbin/libvirtd

9.Do the upper steps several times, i can get the same result with the upper steps ,so mark this bug verifed

Comment 11 Ludek Smid 2014-06-13 10:13:29 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.