Bug 620455
Summary: | condor_rm - could not remove all jobs | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Lubos Trilety <ltrilety> |
Component: | condor | Assignee: | Robert Rati <rrati> |
Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 1.0 | CC: | iboverma, ltoscano, matt, trusnak |
Target Milestone: | 2.0 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | condor-7.5.6-0.1 | Doc Type: | Bug Fix |
Doc Text: |
C: Executing 'condor_rm -all' when there are no jobs in the queue
C: An error message is printed
F: The condor_rm tool better understands when there are no jobs in the queue
R: The condor_rm command now returns a different message (no jobs in queue) and a successful return code
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-06-23 15:41:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 693778 |
Description
Lubos Trilety
2010-08-02 15:05:25 UTC
<ltrilety> I found only this line 'actOnJobs: didn't do any work, aborting' in SchedLog With SCHEDD_DEBUG = D_FULLDEBUG - 11:27:07am $ condor_q -- Submitter: localhost.localdomain : <127.0.0.1:53683> : localhost.localdomain ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held 11:27:10am $ condor_rm -a Could not remove all jobs. 11:27:13am $ condor_rm -const TRUE Couldn't find/remove all jobs matching constraint (TRUE) 11:27:17am $ condor_rm -const FALSE Couldn't find/remove all jobs matching constraint (FALSE) 11:27:19am $ condor_rm -const 1!=1 Couldn't find/remove all jobs matching constraint (1!=1) 11:27:25am $ condor_rm -const 1==1 Couldn't find/remove all jobs matching constraint (1==1) 11:27:28am $ grep actOnJobs /var/log/condor/SchedLog 08/02 11:27:13 actOnJobs: didn't do any work, aborting 08/02 11:27:17 actOnJobs: didn't do any work, aborting 08/02 11:27:19 actOnJobs: didn't do any work, aborting 08/02 11:27:25 actOnJobs: didn't do any work, aborting 08/02 11:27:28 actOnJobs: didn't do any work, aborting schedd.cpp - ... // Set a single attribute which says if the action succeeded // on at least one job or if it was a total failure response_ad->Assign( ATTR_ACTION_RESULT, num_matches ? 1:0 ); ... if( num_matches == 0 ) { // We didn't do anything, so we want to bail out now... dprintf( D_FULLDEBUG, "actOnJobs: didn't do any work, aborting\n" ); if( needs_transaction ) { AbortTransaction(); } unsetQSock(); return FALSE; } ... rm.cpp - -all is implemented with constrain: ClusterId >= 0 ... int result = FALSE; if( !ad->LookupInteger(ATTR_ACTION_RESULT, result) || !result ) { had_error = true; rval = false; } ... The schedd is not returning enough information for rm to respond to the user appropriately. It is currently the case that an ATTR_ACTION_RESULT = 0 really just means that no jobs were modified, and rm could rely on that fact. A proper solution is to enhance the information the schedd sends to rm with the number of changed jobs. A downside to this is a wire protocol change, meaning a new rm will need backward compatibility to deal with an older schedd, and the user-friendly nature of rm will be dictated by the version of the schedd it is interacting with. Actually, rm.cpp has doWorkByConstraint, which has the option to provide more useful information. It even has a comment from 2002-03-29 (3930f2d2) stating, // For now, just return true if the constraint worked on at least // one job, false if not. Someday, we can fix up the tool to take // advantage of all the slick info the schedd gives us back about this // request. The condor_rm command now returns a different message (no jobs in queue) and a success rather than an error message and 1 when run against a schedd with no jobs. It should be noted that the schedd updates its internal statistics on number of jobs run every 10 seconds or so, and it is possible to receive the old error message during this time. Fixed on branch V7_5-BZ620455-rm_all-result-cleanup Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: Executing 'condor_rm -all' when there are no jobs in the queue C: An error message is printed F: The condor_rm tool better understands when there are no jobs in the queue R: The condor_rm command now returns a different message (no jobs in queue) and a successful return code Reproduced on RHEL5,x86_64:
$CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $
# condor_rm -all
Could not remove all jobs.
# echo $?
1
Retested over current version on all supported platforms x86,x86_64/RHEL5,RHEL6:
condor-7.6.1-0.4
# condor_rm -all
condor_rm:0:There are no jobs in the queue
# echo $?
0
Removing all jobs from queue, where no jobs are submitted, return no error, better info message and ended with 0 return code.
>>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html |