Bug 681459

Summary: [Libvirt] When shutting down VM's and restarting libvirtd, libvirtd crash when trying to start.
Product: Red Hat Enterprise Linux 6 Reporter: David Naori <dnaori>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: ajia, dallan, dnaori, eblake, gren, hateya, jyang, mgoldboi, mjenner, xen-maint, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: libvirt-0.8.7-10.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:28:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs none

Description David Naori 2011-03-02 09:28:01 UTC
Created attachment 481816 [details]
logs

Description of problem:
When stopping vms using vdsm and restarting libvirtd, libvirtd crash when trying to start.

bt:
#0  0x00000000004389d7 in qemuReconnectDomain (payload=0x11b0240, name=<value optimized out>, opaque=<value optimized out>) at qemu/qemu_driver.c:1315                           
#1  0x00000036bc230c6a in virHashForEach (table=0x117c750, iter=0x438980 <qemuReconnectDomain>, data=0x7fff8b47e990) at util/hash.c:495                                          
#2  0x0000000000437f7d in qemuReconnectDomains (privileged=<value optimized out>) at qemu/qemu_driver.c:1390                                                                     
#3  qemudStartup (privileged=<value optimized out>) at qemu/qemu_driver.c:1816                                                                                                   
#4  0x00000036bc290790 in virStateInitialize (privileged=1) at libvirt.c:1020                                                                                                    
#5  0x000000000041f920 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:3304

Version-Release number of selected component (if applicable):
-libvirt-0.8.7-8.el6.x86_64
-vdsm-4.9-51

Steps to Reproduce:
1.Stop vms 
2.Restart libvirtd
  
libvirtd log and core dump attached.

Comment 1 Jiri Denemark 2011-03-02 09:58:20 UTC
Can you try reproducing this issue with the packages from http://people.redhat.com/jdenemar/libvirt/ ?

Comment 2 Jiri Denemark 2011-03-03 14:28:22 UTC
The problem was that qemuReconnectDomain can remove the domain object from a hash which it is iterating over which may result in accessing memory which has already been freed.

A reliable reproducer, which results in the crash is:

1. create two transient (using virsh create) qemu domains with the following
   UUIDs:
   dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and
   dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3
   (the exact UUIDs are crucial since they need to be mapped to the same
   hash key so that the two objects form a linked list within the hash)
2. virsh shutdown dom2
3. stop libvirtd service before dom2 finishes its shutdown procedure
4. wait until dom2 shuts down completely
5. start libvirtd service
   the deamon should crash once it detects that it cannot connect to dom2
   qemu monitor

The fix is now upstream as v0.8.8-84-g9677cd3:

commit 9677cd33eea4c65d78ba463b46b8b45ed2da1709
Author: Jiri Denemark <jdenemar>
Date:   Thu Mar 3 14:10:51 2011 +0100

    util: Allow removing hash entries in virHashForEach

Comment 4 Jiri Denemark 2011-03-07 08:39:56 UTC
Additional note for testing steps provided in comment #2:

Depending on what garbage the code ends up accessing, libvirtd can also deadlock instead of crashing.

Comment 6 Gunannan Ren 2011-03-08 10:34:35 UTC
I try to verify this bug on libvirt-0.8.7-8.el6.x86_64 as the steps in comment #2
it reported the following errors:

# service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]

# service libvirtd status
libvirtd dead but pid file exists

# virsh list --all
error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused
error: failed to connect to the hypervisor

# service libvirtd start
Starting libvirtd daemon: libvirtd: error: Unable to obtain pidfile. Check /var/log/messages or run without --daemon for more info.
                                                           [FAILED]

Comment 7 Gunannan Ren 2011-03-08 13:54:30 UTC
Correction: the above comments happened on libvirt-0.8.7-10.el6.x86_64, typo.

I described the retesting steps in details here
1 libvirt version list:

libvirt-0.8.7-10.el6.x86_64
libvirt-devel-0.8.7-10.el6.x86_64
libvirt-client-0.8.7-10.el6.x86_64
libvirt-python-0.8.7-10.el6.x86_64

2, create two transient guests as the steps in comment #2, identical UUIDs used.
# virsh create guest0.xml
Domain guest0 created from guest0.xml

# virsh create guest1.xml
Domain guest1 created from guest1.xml

3, issue "init 0" in the guest1, during the shutdown, stop libvirtd service
# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]

4, after that, using "virsh list --all"
# virsh list --all
error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: No such file or directory
error: failed to connect to the hypervisor

5, start libvirtd service and "serivce libvirtd status" to check the state of libvirtd, it showed:
# service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
# service libvirtd status
libvirtd (pid  14105) is running...

6, "virsh list" again , reported error, checked the libvirtd status:
# virsh list
error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused
error: failed to connect to the hypervisor
# service libvirtd status
libvirtd dead but pid file exists

7, restart libvirtd service, case came back to normal:
# service libvirtd restart
Stopping libvirtd daemon:                                  [FAILED]
Starting libvirtd daemon:                                  [  OK  ]
# service libvirtd status
libvirtd (pid  14307) is running...

Comment 9 zhanghaiyan 2011-03-15 11:42:02 UTC
Reproduced this bug with libvirt-0.8.7-8.el6.x86_64
1. create two transient (using virsh create) qemu domains with the following
   UUIDs:
   dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and
   dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3
2. virsh shutdown dom2
3. stop libvirtd service before dom2 finishes its shutdown procedure
4. wait until dom2 shuts down completely
5. # service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
# virsh list --all
error: cannot recv data: : Connection reset by peer
error: failed to connect to the hypervisor
# service libvirtd status
libvirtd dead but pid file exists

Verified this bug PASS with both libvirt-0.8.7-10.el6.x86_64 and libvirt-0.8.7-11.el6.x86_64
1. create two transient (using virsh create) qemu domains with the following
   UUIDs:
   dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and
   dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3
2. virsh shutdown dom2
3. stop libvirtd service before dom2 finishes its shutdown procedure
4. wait until dom2 shuts down completely
5. # service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
# virsh list --all
 Id Name                 State
----------------------------------
  2 dom1       running
  - cdrom_test           shut off
  - demo                 shut off
  - new                  shut off
  - pxe                  shut off
# service libvirtd status
libvirtd (pid  10225) is running...

Comment 12 errata-xmlrpc 2011-05-19 13:28:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html