Hide Forgot
Description of problem: Any daemon can be stopped remotely using condor_off host (or -addr host). Problem is that if collector is stopped then every try for remote stop/start of any daemon using condor_on/off fails. There is no possibility how to remotely start collector again. Version-Release number of selected component (if applicable): condor-7.6.1-0.8 How reproducible: 100% Steps to Reproduce: 1. set up pool with two machines central manager and execute node (collector,negotiator,schedd,master x startd,master) 2. stop collector daemon remotely condor_off -addr hostname -subsystem collector Sent "Kill-Daemon" command for "collector" to master hostname 3. try to start collector remotely # condor_on hostname -subsystem collector ERROR: can't connect to local collector Can't find address for master hostname Perhaps you need to query another pool. the same result for: # condor_on hostname:<port> -subsystem collector # condor_on ip_address -subsystem collector # condor_on ip_address:<port> -subsystem collector # condor_on -addr hostname -subsystem collector # condor_on -addr hostname:<port> -subsystem collector # condor_on -addr ip_address -subsystem collector # condor_on -addr ip_address:<port> -subsystem collector <port> is port where master on remote machine listen, obtain using nestat # netstat -lpavne | grep condor tcp 0 0 0.0.0.0:33753 0.0.0.0:* LISTEN 64 23118662 28965/condor_master Actual results: collector cannot be started remotely Expected results: there is a way how to start collector on remote machine Additional info:
$ condor_status -master -l | grep Address MyAddress = "<127.0.0.1:51820>" $ condor_off eeyore.local -collector Sent "Kill-Daemon" command for "collector" to master eeyore.local $ condor_on eeyore.local -collector ERROR: can't connect to local collector Can't find address for master eeyore.local Perhaps you need to query another pool. $ condor_on -addr "<127.0.0.1:51820>" -collector Sent "Spawn-Daemon" command for "collector" to collector at <127.0.0.1:51820> -- The failure is because condor_on is trying to use the collector to map hostname to master endpoint. The -addr forms work, but need a properly formed argument, which is apparent above. Granted, the message printed from condor_on looks wrong.