Bug 709749

Summary: collector cannot be started remotely
Product: Red Hat Enterprise MRG Reporter: Lubos Trilety <ltrilety>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: DevelopmentCC: matt
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-01 15:32:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lubos Trilety 2011-06-01 14:51:58 UTC
Description of problem:
Any daemon can be stopped remotely using condor_off host (or -addr host). Problem is that if collector is stopped then every try for remote stop/start of any daemon using condor_on/off fails. There is no possibility how to remotely start collector again.

Version-Release number of selected component (if applicable):
condor-7.6.1-0.8

How reproducible:
100%

Steps to Reproduce:
1. set up pool with two machines central manager and execute node
  (collector,negotiator,schedd,master x startd,master)
2. stop collector daemon remotely
condor_off -addr hostname -subsystem collector
Sent "Kill-Daemon" command for "collector" to master hostname

3. try to start collector remotely
# condor_on hostname -subsystem collector
ERROR: can't connect to local collector
Can't find address for master hostname
Perhaps you need to query another pool.

the same result for:
# condor_on hostname:<port> -subsystem collector
# condor_on ip_address -subsystem collector
# condor_on ip_address:<port> -subsystem collector
# condor_on -addr hostname -subsystem collector
# condor_on -addr hostname:<port> -subsystem collector
# condor_on -addr ip_address -subsystem collector
# condor_on -addr ip_address:<port> -subsystem collector


<port> is port where master on remote machine listen, obtain using nestat
# netstat -lpavne | grep condor
tcp 0  0 0.0.0.0:33753  0.0.0.0:*  LISTEN  64  23118662   28965/condor_master 

  
Actual results:
collector cannot be started remotely

Expected results:
there is a way how to start collector on remote machine

Additional info:

Comment 1 Matthew Farrellee 2011-06-01 15:32:29 UTC
$ condor_status -master -l | grep Address
MyAddress = "<127.0.0.1:51820>"

$ condor_off eeyore.local -collector
Sent "Kill-Daemon" command for "collector" to master eeyore.local

$ condor_on eeyore.local -collector    
ERROR: can't connect to local collector
Can't find address for master eeyore.local
Perhaps you need to query another pool.

$ condor_on -addr "<127.0.0.1:51820>" -collector
Sent "Spawn-Daemon" command for "collector" to collector at <127.0.0.1:51820>

--

The failure is because condor_on is trying to use the collector to map hostname to master endpoint.

The -addr forms work, but need a properly formed argument, which is apparent above.

Granted, the message printed from condor_on looks wrong.