Bug 672533 - condor_configd "connect" to broker which is not running
Summary: condor_configd "connect" to broker which is not running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-client
Version: 1.3
Hardware: Unspecified
OS: Linux
medium
unspecified
Target Milestone: 2.0
: ---
Assignee: Robert Rati
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-25 13:13 UTC by Martin Kudlej
Modified: 2011-06-27 15:33 UTC (History)
4 users (show)

Fixed In Version: condor-wallaby-3.9-4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 15:33:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Martin Kudlej 2011-01-25 13:13:27 UTC
Description of problem:
During reproducing of 667911 I've tried to stop broker on remote configuration server. Configd has started after restarting of Condor and it has tried to connect to broker:

01/25 13:15:24 INFO: Starting Up
01/25 13:15:24 INFO: Hostname is "..."
01/25 13:15:24 INFO: Cleaning up temporary configuration files
01/25 13:15:24 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672)
01/25 13:15:24 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults
01/25 13:15:24 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config"
01/25 13:15:25 DEBUG: Connected to broker "..:5672"
01/25 13:15:25 DEBUG: Looking for the store agent

There is information about that configd is connected to broker, but that cannot be true because broker doesn't run.

Before I've switch off broker, I set up pool by remote configuration:
Group "Internal Default Group":
Group ID: 1
Name: Internal Default Group
Features (priority: name):
  0: Master
  1: NodeAccess
  2: ExecuteNode
Parameters:
  ALLOW_WRITE = *
  CONDOR_HOST = ...
  ALLOW_READ = *

Version-Release number of selected component (if applicable):

I see it in:
condor-wallaby-client-3.6-6  RHEL 4/5 x i386/x86_64
condor-7.4.4-0.17
qpid-cpp-server-0.7.946106-22
qpid-cpp-client-0.7.946106-22
python-condorutils-1.4-5
python-qmf-0.7.946106-14
python-qpid-0.7.946106-14
python-wallabyclient-3.6-6

and also in:
condor-7.4.5-0.7.el5
condor-wallaby-client-3.9-2.el5 RHEL 4/5 x i386/x86_64
condor-wallaby-tools-3.9-2.el5
python-condorutils-1.4-6.el5
qpid-cpp-client-0.7.946106-27.el5
python-qpid-0.7.946106-15.el5
qpid-cpp-server-0.7.946106-27.el5

How reproducible:
100%

Steps to Reproduce:
1. set up condor pool by remote configuration
2. add Master,NodeAccess,ExecuteNode (it is the similar without any features on nodes, but this is more ordinary usage of remote configuration, I think)
3. service stop qpidd
4. check /var/log/condor/ConfigdLog
  
Actual results:
It seems that configd is connected to broker which is down.

Expected results:
If configd doesn't connect to broker properly, it start to trying to connect again and again until CONFIGD_CHECK_INTERVAL and addBroker function raise exception.

Additional info:
I think this bug is more QMF bug, because if I've modified code around "Connected to broker":

try:
  self.broker = self.session.addBroker('amqp://%s' % broker_str, mechanisms=broker_auth_methods)
except:
  if stop_running == False:
    log(logging.CRITICAL, self.logger_name, 'Unable to connect to broker "%s"' % broker_str)
    return(False)

log(logging.DEBUG, self.logger_name, 'Connected to broker "%s", %s' % (broker_str, str(self.broker)))
return(True)

I've got "01/25 13:33:30 DEBUG: Connected to broker "10.34.37.168:5672", Disconnected Broker"


but strange thing is that if I connect manually, I get exception:
$ python
>>> import qmf.console
>>> session = qmf.console.Session()
>>> b=session.addBroker('amqp://localhost:5672', mechanisms='ANONYMOUS') #it doesn't matter if there is ip, localhost or hostname 
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/qmf/console.py", line 639, in addBroker
    ssl = url.scheme == URL.AMQPS, connTimeout=timeout)
  File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2070, in __init__
    raise self.conn_exc
socket.error: (111, 'Connection refused')

Comment 1 Robert Rati 2011-01-28 21:06:00 UTC
The configd now uses events from the client library to denote broker connection/loss.

Fixed on branch broker-notification-messages

Comment 2 Robert Rati 2011-02-08 22:48:39 UTC
The log messages are only seen with debugging enabled, and the debug switch is not documented.  This doesn't need a release/tech note.

Comment 3 Tomas Rusnak 2011-05-03 11:43:59 UTC
Reproduced on:
# condor -v
$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $


# tail -f /var/log/condor/ConfigLog 
05/03 14:38:14 DEBUG: Checked in with the store
05/03 14:38:14 DEBUG: The system is already running configuration version "0"
05/03 14:40:16 DEBUG: Lost connection to the configuration store
05/03 14:42:26 DEBUG: Established connection to the configuration store
05/03 14:42:38 DEBUG: Shutting down
05/03 14:42:38 DEBUG: Closing QMF connections
05/03 14:42:38 DEBUG: Lost connection to the configuration store
05/03 14:42:39 DEBUG: Closed QMF connections
05/03 14:42:39 DEBUG: Setting stop flag
05/03 14:42:39 INFO: Exiting
05/03 14:43:14 INFO: Starting Up
05/03 14:43:14 INFO: Hostname is "rhel5_64-old.mrg-qe-12.lab.eng.brq.redhat.com"
05/03 14:43:14 INFO: Cleaning up temporary configuration files
05/03 14:43:14 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672)
05/03 14:43:14 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults
05/03 14:43:14 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config"
05/03 14:43:15 DEBUG: Connected to broker "localhost:5672"
05/03 14:43:15 DEBUG: Looking for the store agent

Comment 4 Tomas Rusnak 2011-05-03 16:26:48 UTC
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-wallaby-client

# service condor stop
Stopping Condor daemons:                                   [  OK  ]
# service qpidd start
Starting Qpid AMQP daemon:                                 [  OK  ]
# service wallaby start
Starting wallaby-agent:                                    [  OK  ]
# condor_configure_pool -a -f Master,NodeAccess,ExecuteNode -n hostname

Apply these changes [Y/n] ? y
The following parameters need to be set for this configuration to be valid.
ALLOW_READ
ALLOW_WRITE
CONDOR_HOST
Set these parameters now ? [y/N] y
ALLOW_READ: *.redhat.com
ALLOW_WRITE: *
CONDOR_HOST: hostname
Configuration applied

Create a named snapshot of this configuration [y/N] ? 

Activate the changes [y/N] ? y
Activating configuration.  This may take a while, please be patient
Configuration activated
Configuration saved
# service qpidd stop
Stopping Qpid AMQP daemon:                                 [  OK  ]
# service condor start
Starting Condor daemons:                                   [  OK  ]
# tail -f /var/log/condor/ConfigLog 
05/03 15:19:19 DEBUG: Setting stop flag
05/03 15:19:19 INFO: Exiting
05/03 15:20:59 INFO: Starting Up
05/03 15:20:59 INFO: Hostname is "hostname"
05/03 15:20:59 INFO: Cleaning up temporary configuration files
05/03 15:20:59 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672)
05/03 15:20:59 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults
05/03 15:20:59 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config"
05/03 15:20:59 DEBUG: Connecting to broker "localhost:5672"
05/03 15:20:59 DEBUG: Looking for the store agent

Configd daemon is waiting for a broker, now.

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.