Hide Forgot
Created attachment 481816 [details] logs Description of problem: When stopping vms using vdsm and restarting libvirtd, libvirtd crash when trying to start. bt: #0 0x00000000004389d7 in qemuReconnectDomain (payload=0x11b0240, name=<value optimized out>, opaque=<value optimized out>) at qemu/qemu_driver.c:1315 #1 0x00000036bc230c6a in virHashForEach (table=0x117c750, iter=0x438980 <qemuReconnectDomain>, data=0x7fff8b47e990) at util/hash.c:495 #2 0x0000000000437f7d in qemuReconnectDomains (privileged=<value optimized out>) at qemu/qemu_driver.c:1390 #3 qemudStartup (privileged=<value optimized out>) at qemu/qemu_driver.c:1816 #4 0x00000036bc290790 in virStateInitialize (privileged=1) at libvirt.c:1020 #5 0x000000000041f920 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:3304 Version-Release number of selected component (if applicable): -libvirt-0.8.7-8.el6.x86_64 -vdsm-4.9-51 Steps to Reproduce: 1.Stop vms 2.Restart libvirtd libvirtd log and core dump attached.
Can you try reproducing this issue with the packages from http://people.redhat.com/jdenemar/libvirt/ ?
The problem was that qemuReconnectDomain can remove the domain object from a hash which it is iterating over which may result in accessing memory which has already been freed. A reliable reproducer, which results in the crash is: 1. create two transient (using virsh create) qemu domains with the following UUIDs: dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3 (the exact UUIDs are crucial since they need to be mapped to the same hash key so that the two objects form a linked list within the hash) 2. virsh shutdown dom2 3. stop libvirtd service before dom2 finishes its shutdown procedure 4. wait until dom2 shuts down completely 5. start libvirtd service the deamon should crash once it detects that it cannot connect to dom2 qemu monitor The fix is now upstream as v0.8.8-84-g9677cd3: commit 9677cd33eea4c65d78ba463b46b8b45ed2da1709 Author: Jiri Denemark <jdenemar> Date: Thu Mar 3 14:10:51 2011 +0100 util: Allow removing hash entries in virHashForEach
Additional note for testing steps provided in comment #2: Depending on what garbage the code ends up accessing, libvirtd can also deadlock instead of crashing.
I try to verify this bug on libvirt-0.8.7-8.el6.x86_64 as the steps in comment #2 it reported the following errors: # service libvirtd start Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd dead but pid file exists # virsh list --all error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused error: failed to connect to the hypervisor # service libvirtd start Starting libvirtd daemon: libvirtd: error: Unable to obtain pidfile. Check /var/log/messages or run without --daemon for more info. [FAILED]
Correction: the above comments happened on libvirt-0.8.7-10.el6.x86_64, typo. I described the retesting steps in details here 1 libvirt version list: libvirt-0.8.7-10.el6.x86_64 libvirt-devel-0.8.7-10.el6.x86_64 libvirt-client-0.8.7-10.el6.x86_64 libvirt-python-0.8.7-10.el6.x86_64 2, create two transient guests as the steps in comment #2, identical UUIDs used. # virsh create guest0.xml Domain guest0 created from guest0.xml # virsh create guest1.xml Domain guest1 created from guest1.xml 3, issue "init 0" in the guest1, during the shutdown, stop libvirtd service # service libvirtd stop Stopping libvirtd daemon: [ OK ] 4, after that, using "virsh list --all" # virsh list --all error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: No such file or directory error: failed to connect to the hypervisor 5, start libvirtd service and "serivce libvirtd status" to check the state of libvirtd, it showed: # service libvirtd start Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd (pid 14105) is running... 6, "virsh list" again , reported error, checked the libvirtd status: # virsh list error: unable to connect to '/var/run/libvirt/libvirt-sock', libvirtd may need to be started: Connection refused error: failed to connect to the hypervisor # service libvirtd status libvirtd dead but pid file exists 7, restart libvirtd service, case came back to normal: # service libvirtd restart Stopping libvirtd daemon: [FAILED] Starting libvirtd daemon: [ OK ] # service libvirtd status libvirtd (pid 14307) is running...
Reproduced this bug with libvirt-0.8.7-8.el6.x86_64 1. create two transient (using virsh create) qemu domains with the following UUIDs: dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3 2. virsh shutdown dom2 3. stop libvirtd service before dom2 finishes its shutdown procedure 4. wait until dom2 shuts down completely 5. # service libvirtd start Starting libvirtd daemon: [ OK ] # virsh list --all error: cannot recv data: : Connection reset by peer error: failed to connect to the hypervisor # service libvirtd status libvirtd dead but pid file exists Verified this bug PASS with both libvirt-0.8.7-10.el6.x86_64 and libvirt-0.8.7-11.el6.x86_64 1. create two transient (using virsh create) qemu domains with the following UUIDs: dom1: d5b3e8ff-2be6-4f81-a23e-6ec94f2338db and dom2: f0b4f8f7-0a56-4a76-ab7d-522bbe32ada3 2. virsh shutdown dom2 3. stop libvirtd service before dom2 finishes its shutdown procedure 4. wait until dom2 shuts down completely 5. # service libvirtd start Starting libvirtd daemon: [ OK ] # virsh list --all Id Name State ---------------------------------- 2 dom1 running - cdrom_test shut off - demo shut off - new shut off - pxe shut off # service libvirtd status libvirtd (pid 10225) is running...
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html