Bug 647789 - wrong return code if related daemon stopped unexpectedly
Summary: wrong return code if related daemon stopped unexpectedly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: 2.0
: ---
Assignee: Matthew Farrellee
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 693778
TreeView+ depends on / blocked
 
Reported: 2010-10-29 13:17 UTC by Lubos Trilety
Modified: 2011-06-23 15:41 UTC (History)
4 users (show)

Fixed In Version: condor-7.5.6-0.1
Doc Type: Bug Fix
Doc Text:
C: Some command-line tools would not consistently report errors via their return code. C: Scripting the tools can be more complicated than is necessary. F: A number of tools (mentioned in BZ) were updated to report errors when connection problems occurred. R: The tools are now easier to script as they report errors when they have connection problems.
Clone Of:
Environment:
Last Closed: 2011-06-23 15:41:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0889 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.0 Release 2011-06-23 15:35:53 UTC

Description Lubos Trilety 2010-10-29 13:17:52 UTC
Description of problem:
Scripts from following list give bad return code if they run after related daemon (such as condor_master, condor_schedd or condor_startd) unexpected stop.
condor_reschedule
condor_vacate
condor_off
condor_on
condor_reconfig
condor_restart

Version-Release number of selected component (if applicable):
condor-7.4.4-0.16

How reproducible:
100%

Steps to Reproduce:
1. start condor and kill with '-9' condor_master
or simply run 'condor_master -n'
2. run 'condor_off; echo $?'
# condor_off; echo $?
Can't connect to local master
0
  
Actual results:
wrong return code

Expected results:
correct return code (other than 0), when error happened

Additional info:

Comment 1 Matthew Farrellee 2011-02-03 19:46:22 UTC
Or...

$ echo "<1.2.3.4:1234>" > dummy_master_address
$ _CONDOR_MASTER_ADDRESS_FILE=$PWD/dummy_master_address condor_off
Can't connect to local master
$ echo $?
0

The condor_master -n appears to just be a way to get a LOG/.master_address file written and not cleaned up when the master exits.

Comment 2 Matthew Farrellee 2011-02-03 20:03:44 UTC
Bad...

$ for t in condor_on condor_reconfig condor_restart condor_off; do $t; echo $?; done
Can't connect to local master
0
Can't connect to local master
0
Can't connect to local master
0
Can't connect to local master
0

Good...

$ for t in condor_on condor_reconfig condor_restart condor_off; do $t; echo $?; done                 
Can't connect to local master
1
Can't connect to local master
1
Can't connect to local master
1
Can't connect to local master
1

Comment 3 Matthew Farrellee 2011-02-03 20:58:05 UTC
Upstream at https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1895

Fixed for 7.5.6.

commit 6bac22ec1623d39e72e0310ef9485586d1d2e112
Author: Matthew Farrellee <matt@redhat>
Date:   Thu Feb 3 15:56:08 2011 -0500

    Report errors via tool exit codes when talking to local master, #1895

diff --git a/src/condor_tools/tool.cpp b/src/condor_tools/tool.cpp

Comment 4 Matthew Farrellee 2011-02-14 17:17:27 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Some command-line tools would not consistently report errors via their return code.
C: Scripting the tools can be more complicated than is necessary.
F: A number of tools (mentioned in BZ) were updated to report errors when connection problems occurred.
R: The tools are now easier to script as they report errors when they have connection problems.

Comment 6 Tomas Rusnak 2011-05-04 12:53:59 UTC
Reproduced on x86_64/RHEL5 with:

$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 

# condor_off; echo $?
Can't connect to local master
0
#  condor_vacate; echo $?
Can't connect to local startd
0
# condor_off; echo $?
Can't connect to local master
0
# condor_on; echo $?
Can't connect to local master
0
# condor_reconfig; echo $?
Can't connect to local master
0
# condor_restart ; echo $?
Can't connect to local master
0

Comment 7 Tomas Rusnak 2011-05-04 12:58:21 UTC
Retested over all supported platforms - x86,x96_64/RHEL5,RHEL6 with:

condor-7.6.1-0.4

# condor_off; echo $?
Can't find address for local master
Perhaps you need to query another pool.
1
# condor_vacate; echo $?
Can't find address for local startd
Perhaps you need to query another pool.
1
# condor_off; echo $?
Can't find address for local master
Perhaps you need to query another pool.
1
# condor_on; echo $?
Can't find address for local master
Perhaps you need to query another pool.
1
# condor_reconfig; echo $?
Can't find address for local master
Perhaps you need to query another pool.
1
# condor_restart ; echo $?
Can't find address for local master
Perhaps you need to query another pool.
1


All tools returns correct error code, now.

>>> VERIFIED

Comment 8 errata-xmlrpc 2011-06-23 15:41:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html


Note You need to log in before you can comment on or make changes to this bug.