Bug 723031 - Documentation, for the sake of usability we should provide an overview of cleaning up a condor queue
Summary: Documentation, for the sake of usability we should provide an overview of cle...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: Documentation
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
Assignee: Justin Clift
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks: ce-p2-beta
TreeView+ depends on / blocked
 
Reported: 2011-07-18 19:31 UTC by wes hayutin
Modified: 2015-07-13 04:35 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-06-07 06:29:07 UTC
Embargoed:


Attachments (Terms of Use)

Description wes hayutin 2011-07-18 19:31:02 UTC
Description of problem:

I'm not sure if other users of the product have found that it is not uncommon for a condor queue to produce held jobs.  It seems once you get one or two held jobs future instances will not launch.  

For instance I found myself in the situation where I could not get instances to move to running..

-- Submitter: dell-pem600-01.rhts.eng.bos.redhat.com : <10.16.65.232:50154> : dell-pem600-01.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   aeolus          7/18 13:26   0+01:32:59 R  0   0.0  job_ec2First_front
   2.0   aeolus          7/18 13:31   0+00:00:00 I  0   0.0  job_vmware_fronten
   3.0   aeolus          7/18 13:34   0+00:00:00 I  0   0.0  job_vmware02_front
   4.0   aeolus          7/18 14:01   0+00:00:00 H  0   0.0  job_ec2WEST_fronte
   5.0   aeolus          7/18 14:03   0+00:00:00 H  0   0.0  job_ec2WEST01_fron
   6.0   aeolus          7/18 14:09   0+00:49:09 R  0   0.0  job_ec2WEST02_fron
   7.0   aeolus          7/18 14:17   0+00:00:00 I  0   0.0  job_vmware03_front
   8.0   aeolus          7/18 14:21   0+00:00:00 H  0   0.0  job_vmware04_front
   9.0   aeolus          7/18 14:25   0+00:03:05 R  0   0.0  job_vmware05_front
  10.0   aeolus          7/18 14:28   0+00:02:45 R  0   0.0  job_vmware06_front

IMHO I think a user would have a very difficult time moving forward w/o some basic knowledge about how to clean up a condor queue.  It will also help a user clean up the monitor dashboard in aeolus.

I suggest a brief explanation on how to 
1. condor_release: release jobs, and when to release them  
2. condor_rm : how to remove jobs
3. condor_rm -forcex : how to force remove jobs

Comment 1 wes hayutin 2011-07-18 19:33:52 UTC
talking to Slow.. seems to confirm that users will need some help here...

<Slow> weshay_hm: that may be.. IIRC it does bulk polling
<Slow> weshay_hm: so if one is screwing up maybe it can't poll the rest either

Comment 2 wes hayutin 2011-09-28 16:41:14 UTC
making sure all the bugs are at the right version for future queries

Comment 4 Mike Orazi 2011-10-13 15:13:53 UTC
This should no longer be necessary.

Comment 5 Aziza Karol 2011-10-24 07:23:21 UTC
As condor has been removed, so the documentation is no longer necessary.

[root@dell-pesc430-03 ~]# rpm -qa | grep aeolus
aeolus-conductor-daemons-0.5.0-0.20111021031626gita5004c6.fc15.noarch
aeolus-all-0.5.0-0.20111021031626gita5004c6.fc15.noarch
rubygem-aeolus-cli-0.1.0-3.20111003133323git9451323.fc15.noarch
aeolus-conductor-doc-0.5.0-0.20111021031626gita5004c6.fc15.noarch
aeolus-configure-2.2.0-1.20111020103152gitc9b6c03.fc15.noarch
aeolus-conductor-0.5.0-0.20111021031626gita5004c6.fc15.noarch
rubygem-aeolus-image-0.1.0-3.20111003170706git8f23238.fc15.noarch

Comment 7 Lana Brindley 2012-06-07 06:29:07 UTC
Document is now available on docs.redhat.com. Please raise a new bug for any further issues.

LKB


Note You need to log in before you can comment on or make changes to this bug.