Description of problem: I'm not sure if other users of the product have found that it is not uncommon for a condor queue to produce held jobs. It seems once you get one or two held jobs future instances will not launch. For instance I found myself in the situation where I could not get instances to move to running.. -- Submitter: dell-pem600-01.rhts.eng.bos.redhat.com : <10.16.65.232:50154> : dell-pem600-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 aeolus 7/18 13:26 0+01:32:59 R 0 0.0 job_ec2First_front 2.0 aeolus 7/18 13:31 0+00:00:00 I 0 0.0 job_vmware_fronten 3.0 aeolus 7/18 13:34 0+00:00:00 I 0 0.0 job_vmware02_front 4.0 aeolus 7/18 14:01 0+00:00:00 H 0 0.0 job_ec2WEST_fronte 5.0 aeolus 7/18 14:03 0+00:00:00 H 0 0.0 job_ec2WEST01_fron 6.0 aeolus 7/18 14:09 0+00:49:09 R 0 0.0 job_ec2WEST02_fron 7.0 aeolus 7/18 14:17 0+00:00:00 I 0 0.0 job_vmware03_front 8.0 aeolus 7/18 14:21 0+00:00:00 H 0 0.0 job_vmware04_front 9.0 aeolus 7/18 14:25 0+00:03:05 R 0 0.0 job_vmware05_front 10.0 aeolus 7/18 14:28 0+00:02:45 R 0 0.0 job_vmware06_front IMHO I think a user would have a very difficult time moving forward w/o some basic knowledge about how to clean up a condor queue. It will also help a user clean up the monitor dashboard in aeolus. I suggest a brief explanation on how to 1. condor_release: release jobs, and when to release them 2. condor_rm : how to remove jobs 3. condor_rm -forcex : how to force remove jobs
talking to Slow.. seems to confirm that users will need some help here... <Slow> weshay_hm: that may be.. IIRC it does bulk polling <Slow> weshay_hm: so if one is screwing up maybe it can't poll the rest either
making sure all the bugs are at the right version for future queries
This should no longer be necessary.
As condor has been removed, so the documentation is no longer necessary. [root@dell-pesc430-03 ~]# rpm -qa | grep aeolus aeolus-conductor-daemons-0.5.0-0.20111021031626gita5004c6.fc15.noarch aeolus-all-0.5.0-0.20111021031626gita5004c6.fc15.noarch rubygem-aeolus-cli-0.1.0-3.20111003133323git9451323.fc15.noarch aeolus-conductor-doc-0.5.0-0.20111021031626gita5004c6.fc15.noarch aeolus-configure-2.2.0-1.20111020103152gitc9b6c03.fc15.noarch aeolus-conductor-0.5.0-0.20111021031626gita5004c6.fc15.noarch rubygem-aeolus-image-0.1.0-3.20111003170706git8f23238.fc15.noarch
Document is now available on docs.redhat.com. Please raise a new bug for any further issues. LKB