Hide Forgot
Description of problem: During reproducing of 667911 I've tried to stop broker on remote configuration server. Configd has started after restarting of Condor and it has tried to connect to broker: 01/25 13:15:24 INFO: Starting Up 01/25 13:15:24 INFO: Hostname is "..." 01/25 13:15:24 INFO: Cleaning up temporary configuration files 01/25 13:15:24 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 01/25 13:15:24 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 01/25 13:15:24 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config" 01/25 13:15:25 DEBUG: Connected to broker "..:5672" 01/25 13:15:25 DEBUG: Looking for the store agent There is information about that configd is connected to broker, but that cannot be true because broker doesn't run. Before I've switch off broker, I set up pool by remote configuration: Group "Internal Default Group": Group ID: 1 Name: Internal Default Group Features (priority: name): 0: Master 1: NodeAccess 2: ExecuteNode Parameters: ALLOW_WRITE = * CONDOR_HOST = ... ALLOW_READ = * Version-Release number of selected component (if applicable): I see it in: condor-wallaby-client-3.6-6 RHEL 4/5 x i386/x86_64 condor-7.4.4-0.17 qpid-cpp-server-0.7.946106-22 qpid-cpp-client-0.7.946106-22 python-condorutils-1.4-5 python-qmf-0.7.946106-14 python-qpid-0.7.946106-14 python-wallabyclient-3.6-6 and also in: condor-7.4.5-0.7.el5 condor-wallaby-client-3.9-2.el5 RHEL 4/5 x i386/x86_64 condor-wallaby-tools-3.9-2.el5 python-condorutils-1.4-6.el5 qpid-cpp-client-0.7.946106-27.el5 python-qpid-0.7.946106-15.el5 qpid-cpp-server-0.7.946106-27.el5 How reproducible: 100% Steps to Reproduce: 1. set up condor pool by remote configuration 2. add Master,NodeAccess,ExecuteNode (it is the similar without any features on nodes, but this is more ordinary usage of remote configuration, I think) 3. service stop qpidd 4. check /var/log/condor/ConfigdLog Actual results: It seems that configd is connected to broker which is down. Expected results: If configd doesn't connect to broker properly, it start to trying to connect again and again until CONFIGD_CHECK_INTERVAL and addBroker function raise exception. Additional info: I think this bug is more QMF bug, because if I've modified code around "Connected to broker": try: self.broker = self.session.addBroker('amqp://%s' % broker_str, mechanisms=broker_auth_methods) except: if stop_running == False: log(logging.CRITICAL, self.logger_name, 'Unable to connect to broker "%s"' % broker_str) return(False) log(logging.DEBUG, self.logger_name, 'Connected to broker "%s", %s' % (broker_str, str(self.broker))) return(True) I've got "01/25 13:33:30 DEBUG: Connected to broker "10.34.37.168:5672", Disconnected Broker" but strange thing is that if I connect manually, I get exception: $ python >>> import qmf.console >>> session = qmf.console.Session() >>> b=session.addBroker('amqp://localhost:5672', mechanisms='ANONYMOUS') #it doesn't matter if there is ip, localhost or hostname Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib/python2.4/site-packages/qmf/console.py", line 639, in addBroker ssl = url.scheme == URL.AMQPS, connTimeout=timeout) File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2070, in __init__ raise self.conn_exc socket.error: (111, 'Connection refused')
The configd now uses events from the client library to denote broker connection/loss. Fixed on branch broker-notification-messages
The log messages are only seen with debugging enabled, and the debug switch is not documented. This doesn't need a release/tech note.
Reproduced on: # condor -v $CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ # tail -f /var/log/condor/ConfigLog 05/03 14:38:14 DEBUG: Checked in with the store 05/03 14:38:14 DEBUG: The system is already running configuration version "0" 05/03 14:40:16 DEBUG: Lost connection to the configuration store 05/03 14:42:26 DEBUG: Established connection to the configuration store 05/03 14:42:38 DEBUG: Shutting down 05/03 14:42:38 DEBUG: Closing QMF connections 05/03 14:42:38 DEBUG: Lost connection to the configuration store 05/03 14:42:39 DEBUG: Closed QMF connections 05/03 14:42:39 DEBUG: Setting stop flag 05/03 14:42:39 INFO: Exiting 05/03 14:43:14 INFO: Starting Up 05/03 14:43:14 INFO: Hostname is "rhel5_64-old.mrg-qe-12.lab.eng.brq.redhat.com" 05/03 14:43:14 INFO: Cleaning up temporary configuration files 05/03 14:43:14 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 05/03 14:43:14 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 05/03 14:43:14 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config" 05/03 14:43:15 DEBUG: Connected to broker "localhost:5672" 05/03 14:43:15 DEBUG: Looking for the store agent
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with: condor-wallaby-client # service condor stop Stopping Condor daemons: [ OK ] # service qpidd start Starting Qpid AMQP daemon: [ OK ] # service wallaby start Starting wallaby-agent: [ OK ] # condor_configure_pool -a -f Master,NodeAccess,ExecuteNode -n hostname Apply these changes [Y/n] ? y The following parameters need to be set for this configuration to be valid. ALLOW_READ ALLOW_WRITE CONDOR_HOST Set these parameters now ? [y/N] y ALLOW_READ: *.redhat.com ALLOW_WRITE: * CONDOR_HOST: hostname Configuration applied Create a named snapshot of this configuration [y/N] ? Activate the changes [y/N] ? y Activating configuration. This may take a while, please be patient Configuration activated Configuration saved # service qpidd stop Stopping Qpid AMQP daemon: [ OK ] # service condor start Starting Condor daemons: [ OK ] # tail -f /var/log/condor/ConfigLog 05/03 15:19:19 DEBUG: Setting stop flag 05/03 15:19:19 INFO: Exiting 05/03 15:20:59 INFO: Starting Up 05/03 15:20:59 INFO: Hostname is "hostname" 05/03 15:20:59 INFO: Cleaning up temporary configuration files 05/03 15:20:59 DEBUG: "QMF_BROKER_PORT" is not defined. Using default (5672) 05/03 15:20:59 DEBUG: "QMF_BROKER_AUTH_MECHANISM" is not defined. Using defaults 05/03 15:20:59 DEBUG: Writing configuration file to "/var/lib/condor/wallaby_node.config" 05/03 15:20:59 DEBUG: Connecting to broker "localhost:5672" 05/03 15:20:59 DEBUG: Looking for the store agent Configd daemon is waiting for a broker, now. >>> VERIFIED