Bug 530594 - restart of libvirtd causes condor_vm-gahp to hang.
Summary: restart of libvirtd causes condor_vm-gahp to hang.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.3
: ---
Assignee: Timothy St. Clair
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-10-23 16:27 UTC by Timothy St. Clair
Modified: 2010-10-14 16:15 UTC (History)
3 users (show)

Fixed In Version: 7.4.2-0.6
Doc Type: Bug Fix
Doc Text:
Restarting 'libvirtd' on an execute-machine which is running KVM/Xen jobs then 'vm-gahp' would continue on forever even though it had lost communication with the VM, and 'libvirtd' had terminated the VM. With this update, it is recognized that 'libvirtd' does not see the VM anymore and terminates the job.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:15:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Timothy St. Clair 2009-10-23 16:27:04 UTC
When you restart libvirtd on an execute machine which is running KVM || XEN jobs then the vm-gahp will continue on forever even though it has lost communication to the vm, and libvirtd has terminated the vm.  

Reference Ticket upstream is: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=883

Comment 1 Timothy St. Clair 2010-01-19 18:01:31 UTC
(In reply to comment #0)
More specifically upstream tracking ticket is: 
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1119

as 883 is a parent ticket

Comment 2 Timothy St. Clair 2010-01-25 17:49:28 UTC
Changes have been pushed upstream to the 7.4.2 branch 

fix will be in 7.4.2-0.6

Comment 3 Luigi Toscano 2010-05-31 15:08:47 UTC
This bug is linked to some issues of libvirtd, which does not support restarting properly when there are running images at least on RHEL5.x with KVM technology (Xen seems to work). When libvirtd is restarted, the VM is leaked.

The new code promptly recognizes that libvirtd does not see the VM anymore and terminates the job. It does not change Xen behaviour, which still work after the restart.

Verified on RHEL 5.5, KVM/x86_64, Xen/i386, Xen/x86_64.

Comment 4 Martin Prpič 2010-10-07 15:52:07 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Restarting 'libvirtd' on an execute-machine which is running KVM/Xen jobs then 'vm-gahp' would continue on forever even though it had lost communication with the VM, and 'libvirtd' had terminated the VM. With this update, it is recognized that 'libvirtd' does not see the VM anymore and terminates the job.

Comment 6 errata-xmlrpc 2010-10-14 16:15:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.