Hide Forgot
Description of problem: The configd, which uses the native python console, stops being informed of connections to the broker when the broker is bounced continually. Eventually, the configd will lose its connected to the broker and will not be informed of a broker connection when the broker is available. On my system I have qpid, wallaby, and configd installed. The broker seems ok as wallaby is able to connect to the broker after each bounce. The configd is running and has not had an error or exception. It appears the python console is no longer attempting to connect to the broker. Expected connections from qpid-stat -c: Connections client-addr cproc cpid auth connected idle msgIn msgOut ==================================================================================================== 127.0.0.1:5672-127.0.0.1:55556 condor_configd 8859 anonymous 7s 0s 244 323 [::1]:5672-[::1]:36755 qpid-stat 8901 anonymous 1s 0s 257 330 [::1]:5672-[::1]:36720 wallaby-agent 19013 anonymous 24m 9s 24m 0s 79 62 In the failure case, I see: Connections client-addr cproc cpid auth connected idle msgIn msgOut ========================================================================================= [::1]:5672-[::1]:36711 wallaby-agent 19013 anonymous 9s 0s 79 62 [::1]:5672-[::1]:36712 qpid-stat 8194 anonymous 0s 0s 257 330 Version-Release number of selected component (if applicable): qpid 0.10 and qpid 0.14. How reproducible: 100% Steps to Reproduce: 1. start broker, wallaby, configd 2. restart broker repeatedly every 30 min or so 3. configd eventually logs a connection loss but never a connection success Actual results: The configd stops being informed of connections to the broker, and qpid-stat -c shows no connection from the configd. Expected results: The configd should always be receiving connections notifications when a broker is available regardless of have many times it has gone down. Additional info:
FYI: I've reproduced the same behaviour using the above description, except I am using qpid-tool instead of configd. qpid-tool is also based on the python console.py module.
Created attachment 555625 [details] test patch
Rob, can you give the attached patch for console.py a try in your reproducer? Let me know, thanks.
I ran my test over the weekend, and the configd continues to connect to the broker and function as expected.
Patch applied to upstream: https://issues.apache.org/jira/browse/QPID-3798 http://svn.apache.org/viewvc?view=rev&rev=1239648
To reproduce the problem, using qpid-tool: 1) Start up qpid-tool, and leave it running. 2) Start the qpidd daemon: /etc/init.d/qpidd start 3) Every minute, restart qpidd. I used a bash script: while true; do > sleep 60 > /etc/init.d/qpidd restart > done 4) Observe the output from qpid-tool, it should always be able to reconnect to the broker after the restart: [root@mrg44 py]# qpid-tool Management Tool for QPID qpid: Broker disconnected: Disconnected Broker Broker connected: Broker connected at: localhost:5672 Broker disconnected: Disconnected Broker Broker connected: Broker connected at: localhost:5672 Prior to the fix, running this test would eventually cause qpid-tool to disconnect, without reconnecting. e.g.: Broker disconnected: Disconnected Broker Broker connected: Broker connected at: localhost:5672 Broker disconnected: Disconnected Broker (the output would stop there, note no "Broker connected" message after the broker was restarted)
The issue is reliable fixed, tested on RHEL5.7 / 6.2 i[36]86 / x86_64 on packages: python-qpid-0.14-4.el5 python-qpid-qmf-0.14-3.el5 qpid-cpp*-0.14-7.el5 qpid-java-*-0.14-3.el5 qpid-qmf-0.14-3.el5 qpid-qmf-devel-0.14-3.el5 qpid-tests-0.14-1.el5 qpid-tools-0.14-1.el5 rh-qpid-cpp-tests-0.14-7.el5 ruby-qpid-qmf-0.14-3.el5 python-qpid-0.14-5.el6.noarch python-qpid-qmf-0.14-4.el6.x86_64 qpid-cpp-*-0.14-7.el6.x86_64 qpid-java-*-0.14-3.el6.noarch qpid-java-jca*-0.10-11.el6.noarch qpid-qmf-0.14-4.el6.x86_64 qpid-qmf-devel-0.14-4.el6.x86_64 qpid-tests-0.14-1.el6.noarch qpid-tools-0.14-1.el6.noarch rh-qpid-cpp-tests-0.14-7.el6.x86_64 ruby-qpid-qmf-0.14-4.el6.x86_64 -> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause Restarting the broker may result in hung QMF consoles. Consequence Any python-based QMF console will fail to notice the return of the broker and will be unable to monitor QMF agents. Fix A timeout was added to the connection cleanup logic, preventing the hang when a broker is restarted. Result The console will attempt to re-contact the broker periodically when a connection is lost. Once the broker is available, the console will connect to it and resume normal operation.