Bug 525470 - VM configuration status changes are not immediately visible in condor_status
Summary: VM configuration status changes are not immediately visible in condor_status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: All
OS: Linux
low
medium
Target Milestone: 1.2
: ---
Assignee: Timothy St. Clair
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks: 497881 527551
TreeView+ depends on / blocked
 
Reported: 2009-09-24 14:05 UTC by Luigi Toscano
Modified: 2009-12-03 09:20 UTC (History)
3 users (show)

Fixed In Version: Fix should be in 7.4.0-0.1
Doc Type: Bug Fix
Doc Text:
Grid bug fix C: VM support disabled (xend/libvirtd not running) and then enabled again after the status interval has elapsed C: The new status information will not be sent to the collector F: Status updates of VM support are made immediately visible to the Collector. R: Condor_status now reports the current status If VM support was disabled (xend/libvirtd not running) and then enabled again after the status interval had elapsed, the new status information was not being sent to the collector. Status updates of VM support were changed, and made immediately visible to the Collector. The condor_status command now reports the current status correctly.
Clone Of:
Environment:
Last Closed: 2009-12-03 09:20:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Description Luigi Toscano 2009-09-24 14:05:13 UTC
Description of problem:
The new VM_RECHECK_INTERVAL parameter force condor to check again VM configuration status after the specified amount of time.

If VM support was disabled (xend/libvirtd not running) but it is enabled again after the interval is elapsed, the new status is not sent to the Collector. 

Steps to Reproduce:
1. Configure condor enabling VM support, disable xend/libvirtd and start condor.
2. Re-enable xend/libvirtd, wait for VM_RECHECK_INTERVAL seconds and check again condor_status -long.
  
Actual results:
Most of the time 
condor_status -long 
contains HasVM = FALSE, even if 
condor_status -long -direct <vmnode>
reports HasVM = TRUE. The status is updated after a while.

Expected results:
condor_status -long should report the new status, i.e. an update should be triggered on a state change for HasVM.

Comment 1 Timothy St. Clair 2009-09-30 18:32:28 UTC
Initial Investigation: 

I've often found that the variables can be misleading and when searching through the manual and code I found the following

-------- Begin Manual Excerpt (Note the **)
VM_RECHECK_INTERVAL
    An integer number of seconds that defaults to 600 (ten minutes), representing the amount of time the condor_startd waits after a virtual machine error as reported by the condor_starter, and before checking a final time on the status of the virtual machine. If the check fails, Condor disables starting any new vm universe jobs by removing the **VM_Type** attribute from the machine ClassAd.
-------- End Manual Excerpt 

When I looked into the code and re-read the statement it is rather literal, it only adjusts the VM_Type attribute.  It appears to me that HasVM will be updated the next time the classAd is "published", which it appears can happen for several reasons.  
  
If we desire faster acknowledgement via this attribute, we can adjust, but as it stands I do not see a bug with the variable.  

== Feedback is solicited ==

Comment 3 Timothy St. Clair 2009-10-01 13:09:58 UTC
UW Ticket
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=802

Comment 4 Timothy St. Clair 2009-10-05 03:26:45 UTC
Patch went upstream on 10/4/09
Fix should be in 7.4.0-0.6

Comment 6 Irina Boverman 2009-10-29 14:30:03 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
please see bug summary.

Comment 7 Luigi Toscano 2009-11-06 18:34:11 UTC
Status updates of VM support is immediately visible to Collector.

Verified on RHEL 5.4, i386 Xen, x86_64 Xen, x86_64 KVM.

condor-vm-gahp-7.4.1-0.4.el5
condor-7.4.1-0.4.el5

Changing the status to VERIFIED.

Comment 8 Lana Brindley 2009-11-11 21:07:44 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-please see bug summary.+Grid bug fix
+
+C: VM support disabled (xend/libvirtd not running) and then enabled again after the status interval has elapsed
+C: The new status information will not be sent to the collector
+F: Status updates of VM support are made immediately visible to the Collector.
+R: Condor_status now reports the current status
+
+If VM support was disabled (xend/libvirtd not running) and then enabled again after the status interval had elapsed, the new status information was not being sent to the collector. Status updates of VM support were changed, and made immediately visible to the Collector. The condor_status command now reports the current status correctly.

Comment 10 errata-xmlrpc 2009-12-03 09:20:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.