When you restart libvirtd on an execute machine which is running KVM || XEN jobs then the vm-gahp will continue on forever even though it has lost communication to the vm, and libvirtd has terminated the vm. Reference Ticket upstream is: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=883
(In reply to comment #0) More specifically upstream tracking ticket is: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1119 as 883 is a parent ticket
Changes have been pushed upstream to the 7.4.2 branch fix will be in 7.4.2-0.6
This bug is linked to some issues of libvirtd, which does not support restarting properly when there are running images at least on RHEL5.x with KVM technology (Xen seems to work). When libvirtd is restarted, the VM is leaked. The new code promptly recognizes that libvirtd does not see the VM anymore and terminates the job. It does not change Xen behaviour, which still work after the restart. Verified on RHEL 5.5, KVM/x86_64, Xen/i386, Xen/x86_64.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Restarting 'libvirtd' on an execute-machine which is running KVM/Xen jobs then 'vm-gahp' would continue on forever even though it had lost communication with the VM, and 'libvirtd' had terminated the VM. With this update, it is recognized that 'libvirtd' does not see the VM anymore and terminates the job.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html