Bug 497881

Summary: Condor starts before libvirtd and Xen.
Product: Red Hat Enterprise MRG Reporter: Charlie Wyse <cwyse>
Component: condorAssignee: Benjamin Kreuter <bkreuter>
Status: CLOSED CURRENTRELEASE QA Contact: Luigi Toscano <ltoscano>
Severity: medium Docs Contact:
Priority: low    
Version: DevelopmentCC: iboverma, lans.carstensen, lbrindle, ltoscano, matt, tao, tross
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.3.2-0.4 Doc Type: Bug Fix
Doc Text:
Grid bug fix C: Starting Condor when the Xen VM type is used. Condor would sometimes start before Xen started. C: VMGAHP would show an error, and Xen would fail to start correctly. F: The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up R: Condor and Xen now start reliably when being used together. When using Condor with the Xen VM type, Condor would sometimes start before Xen started. This would cause VMGAHP to show an error, and Xen fail to start correctly. The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up. This means that Condor and Xen now start reliably when being used together.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-06 18:33:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 525470    
Bug Blocks: 527551    

Description Charlie Wyse 2009-04-27 17:49:13 UTC
Description of problem:
condor startup proceeds before Xen startup. This leads to errors in VMGAHP.

Version-Release number of selected component (if applicable):
condor-7.2.2-0.9.el5

From /var/log/condor/VMGahpLog:
4/20 11:01:21 VMGAHP[3765]: VM-GAHP initialized with run-mode 0
4/20 11:01:21 VMGAHP[3765]: Initial UID/GUID=0/0, EUID/EGUID=64/64, Condor UID/G
ID=64,64
4/20 11:01:21 VMGAHP[3765]: Initialize Uids: caller=root, job user=condor
4/20 11:01:22 VMGAHP[3765]: Command returned non-zero: /usr/sbin/condor_vm_xen_xslt.sh check
4/20 11:01:22 VMGAHP[3765]:   XM list error
4/20 11:01:22 VMGAHP[3765]:   libvir: Xen Daemon error : internal error failed t
o connect to xend
4/20 11:01:22 VMGAHP[3765]:   libvir: Xen Daemon error : internal error failed t
o connect to xend
4/20 11:01:22 VMGAHP[3765]:   error: failed to connect to the hypervisor
4/20 11:01:22 VMGAHP[3765]: Xen script check failed:
4/20 11:01:22 VMGAHP[3765]:
ERROR: the vm_type('xen') cannot be used. 

So when I look at the ordering of condor startup I see that it loads before libvirtd and xen.  The quick fix is just to change the ordering or restart condor.  But the package itself should be set to work with Xen out of the box.
/etc/rc.d/rc3.d/S90condor
/etc/rc.d/rc3.d/S91condor-ec2-enhanced
/etc/rc.d/rc3.d/S91condor-low-latency
/etc/rc.d/rc3.d/S97libvirtd
/etc/rc.d/rc3.d/S98xend

Comment 1 Lans Carstensen 2009-06-10 15:30:10 UTC
Changing condor init script to S98 (e.g. after libvirtd) is enough to make the init sequence work reliably for vm_type of Xen.

Comment 2 Matthew Farrellee 2009-07-16 20:54:30 UTC
Fixed upstream, present for 7.3.2-0.4 build

Comment 3 Matthew Farrellee 2009-08-04 01:13:51 UTC
An additional fix for this has gone in upstream. The condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on startup.

Both fixes should need to be verified.

Comment 4 Luigi Toscano 2009-09-03 15:59:36 UTC
Is the checking interval static or configurable? 
If it is static, how much is it? 
Otherwise, how can it be changed?

Comment 5 Matthew Farrellee 2009-09-03 16:08:59 UTC
See: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=551

The param is VM_RECHECK_INTERVAL and it defaults to 600 seconds (10 minutes).

Comment 10 Irina Boverman 2009-10-22 19:08:32 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Changed condor init script to work reliably for vm_type of Xen. Additionally condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on start up (497881)

Comment 12 Luigi Toscano 2009-11-06 18:33:53 UTC
The order of startup scripts has been fixed and the new VM_RECHECK_INTERVAL option is working as expected.

Verified on RHEL 5.4, i386 Xen, x86_64 Xen, x86_64 KVM.

condor-vm-gahp-7.4.1-0.4.el5
condor-7.4.1-0.4.el5

Closing the bug.

Comment 13 Lana Brindley 2009-11-26 20:39:39 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
-Changed condor init script to work reliably for vm_type of Xen. Additionally condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on start up (497881)+Grid bug fix
+
+C: Starting Condor when the Xen VM type is used. Condor would sometimes start before Xen started.
+C: VMGAHP would show an error, and Xen would fail to start correctly.
+F: The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up
+R: Condor and Xen now start reliably when being used together.
+
+
+When using Condor with the Xen VM type, Condor would sometimes start before Xen started. This would cause VMGAHP to show an error, and Xen fail to start correctly. The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up. This means that Condor and Xen now start reliably when being used together.