| Summary: | condor_configd "connect" to broker which is not running | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> |
| Component: | condor-wallaby-client | Assignee: | Robert Rati <rrati> |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.3 | CC: | iboverma, jneedle, matt, trusnak |
| Target Milestone: | 2.0 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | condor-wallaby-3.9-4 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-06-27 15:33:42 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
The configd now uses events from the client library to denote broker connection/loss. Fixed on branch broker-notification-messages The log messages are only seen with debugging enabled, and the debug switch is not documented. This doesn't need a release/tech note. Reproduced on: # condor -v $CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ # tail -f /var/log/condor/ConfigLog 05/03 14:38:14 DEBUG: Checked in with the store 05/03 14:38:14 DEBUG: The system is already running configuration version "0" 05/03 14:40:16 DEBUG: Lost connection to the configuration store 05/03 14:42:26 DEBUG: Established connection to the configuration store 05/03 14:42:38 DEBUG: Shutting down 05/03 14:42:38 DEBUG: Closing QMF connections 05/03 14:42:38 DEBUG: Lost connection to the configuration store 05/03 14:42:39 DEBUG: Closed QMF connections 05/03 14:42:39 DEBUG: Setting stop flag 05/03 14:42:39 INFO: Exiting 05/03 14:43:14 INFO: Starting Up 05/03 14:43:14 INFO: Hostname is "rhel5_64-old.mrg-qe-12.lab.eng.brq.redhat.com" 05/03 14:43:14 INFO: Cleaning up temporary configuration files 05/03 14:43:14 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 05/03 14:43:14 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 05/03 14:43:14 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config" 05/03 14:43:15 DEBUG: Connected to broker "localhost:5672" 05/03 14:43:15 DEBUG: Looking for the store agent Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:
condor-wallaby-client
# service condor stop
Stopping Condor daemons: [ OK ]
# service qpidd start
Starting Qpid AMQP daemon: [ OK ]
# service wallaby start
Starting wallaby-agent: [ OK ]
# condor_configure_pool -a -f Master,NodeAccess,ExecuteNode -n hostname
Apply these changes [Y/n] ? y
The following parameters need to be set for this configuration to be valid.
ALLOW_READ
ALLOW_WRITE
CONDOR_HOST
Set these parameters now ? [y/N] y
ALLOW_READ: *.redhat.com
ALLOW_WRITE: *
CONDOR_HOST: hostname
Configuration applied
Create a named snapshot of this configuration [y/N] ?
Activate the changes [y/N] ? y
Activating configuration. This may take a while, please be patient
Configuration activated
Configuration saved
# service qpidd stop
Stopping Qpid AMQP daemon: [ OK ]
# service condor start
Starting Condor daemons: [ OK ]
# tail -f /var/log/condor/ConfigLog
05/03 15:19:19 DEBUG: Setting stop flag
05/03 15:19:19 INFO: Exiting
05/03 15:20:59 INFO: Starting Up
05/03 15:20:59 INFO: Hostname is "hostname"
05/03 15:20:59 INFO: Cleaning up temporary configuration files
05/03 15:20:59 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672)
05/03 15:20:59 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults
05/03 15:20:59 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config"
05/03 15:20:59 DEBUG: Connecting to broker "localhost:5672"
05/03 15:20:59 DEBUG: Looking for the store agent
Configd daemon is waiting for a broker, now.
>>> VERIFIED
|
Description of problem: During reproducing of 667911 I've tried to stop broker on remote configuration server. Configd has started after restarting of Condor and it has tried to connect to broker: 01/25 13:15:24 INFO: Starting Up 01/25 13:15:24 INFO: Hostname is "..." 01/25 13:15:24 INFO: Cleaning up temporary configuration files 01/25 13:15:24 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 01/25 13:15:24 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 01/25 13:15:24 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config" 01/25 13:15:25 DEBUG: Connected to broker "..:5672" 01/25 13:15:25 DEBUG: Looking for the store agent There is information about that configd is connected to broker, but that cannot be true because broker doesn't run. Before I've switch off broker, I set up pool by remote configuration: Group "Internal Default Group": Group ID: 1 Name: Internal Default Group Features (priority: name): 0: Master 1: NodeAccess 2: ExecuteNode Parameters: ALLOW_WRITE = * CONDOR_HOST = ... ALLOW_READ = * Version-Release number of selected component (if applicable): I see it in: condor-wallaby-client-3.6-6 RHEL 4/5 x i386/x86_64 condor-7.4.4-0.17 qpid-cpp-server-0.7.946106-22 qpid-cpp-client-0.7.946106-22 python-condorutils-1.4-5 python-qmf-0.7.946106-14 python-qpid-0.7.946106-14 python-wallabyclient-3.6-6 and also in: condor-7.4.5-0.7.el5 condor-wallaby-client-3.9-2.el5 RHEL 4/5 x i386/x86_64 condor-wallaby-tools-3.9-2.el5 python-condorutils-1.4-6.el5 qpid-cpp-client-0.7.946106-27.el5 python-qpid-0.7.946106-15.el5 qpid-cpp-server-0.7.946106-27.el5 How reproducible: 100% Steps to Reproduce: 1. set up condor pool by remote configuration 2. add Master,NodeAccess,ExecuteNode (it is the similar without any features on nodes, but this is more ordinary usage of remote configuration, I think) 3. service stop qpidd 4. check /var/log/condor/ConfigdLog Actual results: It seems that configd is connected to broker which is down. Expected results: If configd doesn't connect to broker properly, it start to trying to connect again and again until CONFIGD_CHECK_INTERVAL and addBroker function raise exception. Additional info: I think this bug is more QMF bug, because if I've modified code around "Connected to broker": try: self.broker = self.session.addBroker('amqp://%s' % broker_str, mechanisms=broker_auth_methods) except: if stop_running == False: log(logging.CRITICAL, self.logger_name, 'Unable to connect to broker "%s"' % broker_str) return(False) log(logging.DEBUG, self.logger_name, 'Connected to broker "%s", %s' % (broker_str, str(self.broker))) return(True) I've got "01/25 13:33:30 DEBUG: Connected to broker "10.34.37.168:5672", Disconnected Broker" but strange thing is that if I connect manually, I get exception: $ python >>> import qmf.console >>> session = qmf.console.Session() >>> b=session.addBroker('amqp://localhost:5672', mechanisms='ANONYMOUS') #it doesn't matter if there is ip, localhost or hostname Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/site-packages/qmf/console.py", line 639, in addBroker ssl = url.scheme == URL.AMQPS, connTimeout=timeout) File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2070, in __init__ raise self.conn_exc socket.error: (111, 'Connection refused')