Bug 497881 - Condor starts before libvirtd and Xen.
Summary: Condor starts before libvirtd and Xen.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: All
OS: Linux
low
medium
Target Milestone: 1.2
: ---
Assignee: Benjamin Kreuter
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On: 525470
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-04-27 17:49 UTC by Charlie Wyse
Modified: 2018-10-20 03:53 UTC (History)
7 users (show)

Fixed In Version: condor-7.3.2-0.4
Doc Type: Bug Fix
Doc Text:
Grid bug fix C: Starting Condor when the Xen VM type is used. Condor would sometimes start before Xen started. C: VMGAHP would show an error, and Xen would fail to start correctly. F: The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up R: Condor and Xen now start reliably when being used together. When using Condor with the Xen VM type, Condor would sometimes start before Xen started. This would cause VMGAHP to show an error, and Xen fail to start correctly. The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up. This means that Condor and Xen now start reliably when being used together.
Clone Of:
Environment:
Last Closed: 2009-11-06 18:33:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Charlie Wyse 2009-04-27 17:49:13 UTC
Description of problem:
condor startup proceeds before Xen startup. This leads to errors in VMGAHP.

Version-Release number of selected component (if applicable):
condor-7.2.2-0.9.el5

From /var/log/condor/VMGahpLog:
4/20 11:01:21 VMGAHP[3765]: VM-GAHP initialized with run-mode 0
4/20 11:01:21 VMGAHP[3765]: Initial UID/GUID=0/0, EUID/EGUID=64/64, Condor UID/G
ID=64,64
4/20 11:01:21 VMGAHP[3765]: Initialize Uids: caller=root, job user=condor
4/20 11:01:22 VMGAHP[3765]: Command returned non-zero: /usr/sbin/condor_vm_xen_xslt.sh check
4/20 11:01:22 VMGAHP[3765]:   XM list error
4/20 11:01:22 VMGAHP[3765]:   libvir: Xen Daemon error : internal error failed t
o connect to xend
4/20 11:01:22 VMGAHP[3765]:   libvir: Xen Daemon error : internal error failed t
o connect to xend
4/20 11:01:22 VMGAHP[3765]:   error: failed to connect to the hypervisor
4/20 11:01:22 VMGAHP[3765]: Xen script check failed:
4/20 11:01:22 VMGAHP[3765]:
ERROR: the vm_type('xen') cannot be used. 

So when I look at the ordering of condor startup I see that it loads before libvirtd and xen.  The quick fix is just to change the ordering or restart condor.  But the package itself should be set to work with Xen out of the box.
/etc/rc.d/rc3.d/S90condor
/etc/rc.d/rc3.d/S91condor-ec2-enhanced
/etc/rc.d/rc3.d/S91condor-low-latency
/etc/rc.d/rc3.d/S97libvirtd
/etc/rc.d/rc3.d/S98xend

Comment 1 Lans Carstensen 2009-06-10 15:30:10 UTC
Changing condor init script to S98 (e.g. after libvirtd) is enough to make the init sequence work reliably for vm_type of Xen.

Comment 2 Matthew Farrellee 2009-07-16 20:54:30 UTC
Fixed upstream, present for 7.3.2-0.4 build

Comment 3 Matthew Farrellee 2009-08-04 01:13:51 UTC
An additional fix for this has gone in upstream. The condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on startup.

Both fixes should need to be verified.

Comment 4 Luigi Toscano 2009-09-03 15:59:36 UTC
Is the checking interval static or configurable? 
If it is static, how much is it? 
Otherwise, how can it be changed?

Comment 5 Matthew Farrellee 2009-09-03 16:08:59 UTC
See: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=551

The param is VM_RECHECK_INTERVAL and it defaults to 600 seconds (10 minutes).

Comment 10 Irina Boverman 2009-10-22 19:08:32 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Changed condor init script to work reliably for vm_type of Xen. Additionally condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on start up (497881)

Comment 12 Luigi Toscano 2009-11-06 18:33:53 UTC
The order of startup scripts has been fixed and the new VM_RECHECK_INTERVAL option is working as expected.

Verified on RHEL 5.4, i386 Xen, x86_64 Xen, x86_64 KVM.

condor-vm-gahp-7.4.1-0.4.el5
condor-7.4.1-0.4.el5

Closing the bug.

Comment 13 Lana Brindley 2009-11-26 20:39:39 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
-Changed condor init script to work reliably for vm_type of Xen. Additionally condor_startd will now periodically check for VM Universe support when VM_TYPE is configured and the VM Universe support is not available on start up (497881)+Grid bug fix
+
+C: Starting Condor when the Xen VM type is used. Condor would sometimes start before Xen started.
+C: VMGAHP would show an error, and Xen would fail to start correctly.
+F: The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up
+R: Condor and Xen now start reliably when being used together.
+
+
+When using Condor with the Xen VM type, Condor would sometimes start before Xen started. This would cause VMGAHP to show an error, and Xen fail to start correctly. The Condor init script was changed, and the condor_startd now periodically checks for VM universe support when VM_TYPE is configured and the VM Universe support is not available on start up. This means that Condor and Xen now start reliably when being used together.


Note You need to log in before you can comment on or make changes to this bug.