Hide Forgot
Created attachment 529485 [details] detailed threaded backtrace Description of problem: I'm destroying VM, then installing new one (with create) and after this new machine is off and disappears, libvirtd stops. virsh --connect qemu+ssh://remote-machine destroy "${installed_pc}" virsh --connect qemu+ssh://remote-machine create "~/${installed_pc}-install.xml" virsh --connect qemu+ssh://remote-machine domstate "${installed_pc}"-install virsh --connect qemu+ssh://remote-machine start "${installed_pc}" Version-Release number of selected component (if applicable): libvirt-0.8.2-22.el5 How reproducible: 20% (depends on how many machines are created) Steps to Reproduce: 1. virsh --connect qemu+ssh://remote-machine destroy "${installed_pc}" 2. virsh --connect qemu+ssh://remote-machine create "~/${installed_pc}-install.xml" 3. virsh --connect qemu+ssh://remote-machine domstate "${installed_pc}"-install 4. virsh --connect qemu+ssh://remote-machine start "${installed_pc}" Actual results: libvirtd coredumped Expected results: libvirtd continues serving Additional info:
0) are steps 1-4 run in a script, or by hand? Where exactly in this sequence is the crash occurring? From the description it sounds like it's crashing after the guest is destroyed, but from the backtrace it looks like libvirtd is crashing during the "virsh domstate". 1) Normally, virsh create will bring up a running guest, so "virsh start" should return "error: Domain is already active". Can you explain what you were trying to do here? 2) Is anything else running that might be calling libvirt on remote-machine? In particular, is virt-manager running? 3) When you say "depends on how many machines are created", are you referring to how many guests are active at the moment? Or are you just saying that when you sequentially create-destroy-create-destroy sequentially (never more than a single guest active at a time) eventually this crash will occur? 4) For completeness' sake can you include the guest XML? 5) Does the same thing happen if you run virsh locally on "remote-machine" rather than doing it remotely? At first glance this seems to be a problem of incorrect refcounting of the domain object, which has had problems wrt transient domains. It will take some digging to find patches that corrected such problems and determine whether or not those patches are applicable to libvirt-0.8.2, but being able to easily reproduce will help narrow things down.
I see many similarities between this and Bug 670848 (which was reported against RHEL6 / libvirt-0.8.7). It appears that all three of the patches in that bug are also relevant to RHEL5 / libvirt-0.8.2, and there may be others (see Comment #18 of Bug 670848). This is all very tricky/delicate code though, so we need to be very careful about what we take in, to avoid unexpected regressions.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
We partly compensated this bug by small watchdog script: # cat bin/watchdog_libvirtd.sh #!/usr/bin/env bash while true; do service libvirtd status &> /dev/null || service libvirtd restart &> /dev/null; sleep 5; done #eof