Bug 525470

Summary: VM configuration status changes are not immediately visible in condor_status
Product: Red Hat Enterprise MRG Reporter: Luigi Toscano <ltoscano>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED ERRATA QA Contact: Luigi Toscano <ltoscano>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2CC: lbrindle, matt, tstclair
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Fix should be in 7.4.0-0.1 Doc Type: Bug Fix
Doc Text:
Grid bug fix C: VM support disabled (xend/libvirtd not running) and then enabled again after the status interval has elapsed C: The new status information will not be sent to the collector F: Status updates of VM support are made immediately visible to the Collector. R: Condor_status now reports the current status If VM support was disabled (xend/libvirtd not running) and then enabled again after the status interval had elapsed, the new status information was not being sent to the collector. Status updates of VM support were changed, and made immediately visible to the Collector. The condor_status command now reports the current status correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:20:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 497881, 527551    

Description Luigi Toscano 2009-09-24 14:05:13 UTC
Description of problem:
The new VM_RECHECK_INTERVAL parameter force condor to check again VM configuration status after the specified amount of time.

If VM support was disabled (xend/libvirtd not running) but it is enabled again after the interval is elapsed, the new status is not sent to the Collector. 

Steps to Reproduce:
1. Configure condor enabling VM support, disable xend/libvirtd and start condor.
2. Re-enable xend/libvirtd, wait for VM_RECHECK_INTERVAL seconds and check again condor_status -long.
  
Actual results:
Most of the time 
condor_status -long 
contains HasVM = FALSE, even if 
condor_status -long -direct <vmnode>
reports HasVM = TRUE. The status is updated after a while.

Expected results:
condor_status -long should report the new status, i.e. an update should be triggered on a state change for HasVM.

Comment 1 Timothy St. Clair 2009-09-30 18:32:28 UTC
Initial Investigation: 

I've often found that the variables can be misleading and when searching through the manual and code I found the following

-------- Begin Manual Excerpt (Note the **)
VM_RECHECK_INTERVAL
    An integer number of seconds that defaults to 600 (ten minutes), representing the amount of time the condor_startd waits after a virtual machine error as reported by the condor_starter, and before checking a final time on the status of the virtual machine. If the check fails, Condor disables starting any new vm universe jobs by removing the **VM_Type** attribute from the machine ClassAd.
-------- End Manual Excerpt 

When I looked into the code and re-read the statement it is rather literal, it only adjusts the VM_Type attribute.  It appears to me that HasVM will be updated the next time the classAd is "published", which it appears can happen for several reasons.  
  
If we desire faster acknowledgement via this attribute, we can adjust, but as it stands I do not see a bug with the variable.  

== Feedback is solicited ==

Comment 3 Timothy St. Clair 2009-10-01 13:09:58 UTC
UW Ticket
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=802

Comment 4 Timothy St. Clair 2009-10-05 03:26:45 UTC
Patch went upstream on 10/4/09
Fix should be in 7.4.0-0.6

Comment 6 Irina Boverman 2009-10-29 14:30:03 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
please see bug summary.

Comment 7 Luigi Toscano 2009-11-06 18:34:11 UTC
Status updates of VM support is immediately visible to Collector.

Verified on RHEL 5.4, i386 Xen, x86_64 Xen, x86_64 KVM.

condor-vm-gahp-7.4.1-0.4.el5
condor-7.4.1-0.4.el5

Changing the status to VERIFIED.

Comment 8 Lana Brindley 2009-11-11 21:07:44 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-please see bug summary.+Grid bug fix
+
+C: VM support disabled (xend/libvirtd not running) and then enabled again after the status interval has elapsed
+C: The new status information will not be sent to the collector
+F: Status updates of VM support are made immediately visible to the Collector.
+R: Condor_status now reports the current status
+
+If VM support was disabled (xend/libvirtd not running) and then enabled again after the status interval had elapsed, the new status information was not being sent to the collector. Status updates of VM support were changed, and made immediately visible to the Collector. The condor_status command now reports the current status correctly.

Comment 10 errata-xmlrpc 2009-12-03 09:20:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html