Bug 723031

Summary: Documentation, for the sake of usability we should provide an overview of cleaning up a condor queue
Product: [Retired] CloudForms Cloud Engine Reporter: wes hayutin <whayutin>
Component: DocumentationAssignee: Justin Clift <jclift>
Status: CLOSED CURRENTRELEASE QA Contact: wes hayutin <whayutin>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.0.0CC: akarol, deltacloud-maint, kwade, lbrindle, morazi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-07 06:29:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 709348    

Description wes hayutin 2011-07-18 19:31:02 UTC
Description of problem:

I'm not sure if other users of the product have found that it is not uncommon for a condor queue to produce held jobs.  It seems once you get one or two held jobs future instances will not launch.  

For instance I found myself in the situation where I could not get instances to move to running..

-- Submitter: dell-pem600-01.rhts.eng.bos.redhat.com : <10.16.65.232:50154> : dell-pem600-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   aeolus          7/18 13:26   0+01:32:59 R  0   0.0  job_ec2First_front
   2.0   aeolus          7/18 13:31   0+00:00:00 I  0   0.0  job_vmware_fronten
   3.0   aeolus          7/18 13:34   0+00:00:00 I  0   0.0  job_vmware02_front
   4.0   aeolus          7/18 14:01   0+00:00:00 H  0   0.0  job_ec2WEST_fronte
   5.0   aeolus          7/18 14:03   0+00:00:00 H  0   0.0  job_ec2WEST01_fron
   6.0   aeolus          7/18 14:09   0+00:49:09 R  0   0.0  job_ec2WEST02_fron
   7.0   aeolus          7/18 14:17   0+00:00:00 I  0   0.0  job_vmware03_front
   8.0   aeolus          7/18 14:21   0+00:00:00 H  0   0.0  job_vmware04_front
   9.0   aeolus          7/18 14:25   0+00:03:05 R  0   0.0  job_vmware05_front
  10.0   aeolus          7/18 14:28   0+00:02:45 R  0   0.0  job_vmware06_front

IMHO I think a user would have a very difficult time moving forward w/o some basic knowledge about how to clean up a condor queue.  It will also help a user clean up the monitor dashboard in aeolus.

I suggest a brief explanation on how to 
1. condor_release: release jobs, and when to release them  
2. condor_rm : how to remove jobs
3. condor_rm -forcex : how to force remove jobs

Comment 1 wes hayutin 2011-07-18 19:33:52 UTC
talking to Slow.. seems to confirm that users will need some help here...

<Slow> weshay_hm: that may be.. IIRC it does bulk polling
<Slow> weshay_hm: so if one is screwing up maybe it can't poll the rest either

Comment 2 wes hayutin 2011-09-28 16:41:14 UTC
making sure all the bugs are at the right version for future queries

Comment 4 Mike Orazi 2011-10-13 15:13:53 UTC
This should no longer be necessary.

Comment 5 Aziza Karol 2011-10-24 07:23:21 UTC
As condor has been removed, so the documentation is no longer necessary.

[root@dell-pesc430-03 ~]# rpm -qa | grep aeolus
aeolus-conductor-daemons-0.5.0-0.20111021031626gita5004c6.fc15.noarch
aeolus-all-0.5.0-0.20111021031626gita5004c6.fc15.noarch
rubygem-aeolus-cli-0.1.0-3.20111003133323git9451323.fc15.noarch
aeolus-conductor-doc-0.5.0-0.20111021031626gita5004c6.fc15.noarch
aeolus-configure-2.2.0-1.20111020103152gitc9b6c03.fc15.noarch
aeolus-conductor-0.5.0-0.20111021031626gita5004c6.fc15.noarch
rubygem-aeolus-image-0.1.0-3.20111003170706git8f23238.fc15.noarch

Comment 7 Lana Brindley 2012-06-07 06:29:07 UTC
Document is now available on docs.redhat.com. Please raise a new bug for any further issues.

LKB