Bug 773700

Summary: python console stops trying to connect to broker
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: qpid-qmfAssignee: Ken Giusti <kgiusti>
Status: CLOSED CURRENTRELEASE QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: esammons, freznice, jross, kgiusti, matt
Target Milestone: 2.1.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-qmf-0.14-3.el5, qpid-qmf-0.14-4.el6 Doc Type: Bug Fix
Doc Text:
Cause Restarting the broker may result in hung QMF consoles. Consequence Any python-based QMF console will fail to notice the return of the broker and will be unable to monitor QMF agents. Fix A timeout was added to the connection cleanup logic, preventing the hang when a broker is restarted. Result The console will attempt to re-contact the broker periodically when a connection is lost. Once the broker is available, the console will connect to it and resume normal operation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-07 17:42:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
test patch none

Description Robert Rati 2012-01-12 16:35:09 UTC
Description of problem:
The configd, which uses the native python console, stops being informed of connections to the broker when the broker is bounced continually.  Eventually, the configd will lose its connected to the broker and will not be informed of a broker connection when the broker is available.

On my system I have qpid, wallaby, and configd installed.  The broker seems ok as wallaby is able to connect to the broker after each bounce.  The configd is running and has not had an error or exception.  It appears the python console is no longer attempting to connect to the broker.

Expected connections from qpid-stat -c:
Connections
  client-addr                     cproc           cpid   auth       connected  idle    msgIn  msgOut
  ====================================================================================================
  127.0.0.1:5672-127.0.0.1:55556  condor_configd  8859   anonymous  7s         0s       244    323
  [::1]:5672-[::1]:36755          qpid-stat       8901   anonymous  1s         0s       257    330
  [::1]:5672-[::1]:36720          wallaby-agent   19013  anonymous  24m 9s     24m 0s    79     62


In the failure case, I see:
Connections
  client-addr             cproc          cpid   auth       connected  idle  msgIn  msgOut
  =========================================================================================
  [::1]:5672-[::1]:36711  wallaby-agent  19013  anonymous  9s         0s      79     62
  [::1]:5672-[::1]:36712  qpid-stat      8194   anonymous  0s         0s     257    330

Version-Release number of selected component (if applicable):
qpid 0.10 and qpid 0.14.

How reproducible:
100%

Steps to Reproduce:
1. start broker, wallaby, configd
2. restart broker repeatedly every 30 min or so
3. configd eventually logs a connection loss but never a connection success
  
Actual results:
The configd stops being informed of connections to the broker, and qpid-stat -c shows no connection from the configd.

Expected results:
The configd should always be receiving connections notifications when a broker is available regardless of have many times it has gone down.


Additional info:

Comment 1 Ken Giusti 2012-01-13 19:02:42 UTC
FYI: I've reproduced the same behaviour using the above description, except I am using qpid-tool instead of configd.  qpid-tool is also based on the python console.py module.

Comment 2 Ken Giusti 2012-01-16 22:45:58 UTC
Created attachment 555625 [details]
test patch

Comment 3 Ken Giusti 2012-01-16 22:47:14 UTC
Rob,  can you give the attached patch for console.py a try in your reproducer?  Let me know, thanks.

Comment 4 Robert Rati 2012-01-30 15:36:53 UTC
I ran my test over the weekend, and the configd continues to connect to the broker and function as expected.

Comment 6 Ken Giusti 2012-02-13 15:38:25 UTC
To reproduce the problem, using qpid-tool:

1) Start up qpid-tool, and leave it running.
2) Start the qpidd daemon: /etc/init.d/qpidd start

3) Every minute, restart qpidd.  I used a bash script:


 while true; do
> sleep 60
> /etc/init.d/qpidd restart
> done

4) Observe the output from qpid-tool, it should always be able to reconnect to the broker after the restart:

[root@mrg44 py]# qpid-tool
Management Tool for QPID
qpid: Broker disconnected: Disconnected Broker
Broker connected: Broker connected at: localhost:5672
Broker disconnected: Disconnected Broker
Broker connected: Broker connected at: localhost:5672

Prior to the fix, running this test would eventually cause qpid-tool to disconnect, without reconnecting.  e.g.:

Broker disconnected: Disconnected Broker
Broker connected: Broker connected at: localhost:5672
Broker disconnected: Disconnected Broker

(the output would stop there, note no "Broker connected" message after the broker was restarted)

Comment 8 Frantisek Reznicek 2012-03-01 15:46:51 UTC
The issue is reliable fixed, tested on RHEL5.7 / 6.2 i[36]86 / x86_64 on packages:

  python-qpid-0.14-4.el5
  python-qpid-qmf-0.14-3.el5
  qpid-cpp*-0.14-7.el5
  qpid-java-*-0.14-3.el5
  qpid-qmf-0.14-3.el5
  qpid-qmf-devel-0.14-3.el5
  qpid-tests-0.14-1.el5
  qpid-tools-0.14-1.el5
  rh-qpid-cpp-tests-0.14-7.el5
  ruby-qpid-qmf-0.14-3.el5

  python-qpid-0.14-5.el6.noarch
  python-qpid-qmf-0.14-4.el6.x86_64
  qpid-cpp-*-0.14-7.el6.x86_64
  qpid-java-*-0.14-3.el6.noarch
  qpid-java-jca*-0.10-11.el6.noarch
  qpid-qmf-0.14-4.el6.x86_64
  qpid-qmf-devel-0.14-4.el6.x86_64
  qpid-tests-0.14-1.el6.noarch
  qpid-tools-0.14-1.el6.noarch
  rh-qpid-cpp-tests-0.14-7.el6.x86_64
  ruby-qpid-qmf-0.14-4.el6.x86_64


-> VERIFIED

Comment 9 Ken Giusti 2012-03-07 21:14:31 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Restarting the broker may result in hung QMF consoles.
Consequence
    Any python-based QMF console will fail to notice the return of the broker and will be unable to monitor QMF agents.
Fix
    A timeout was added to the connection cleanup logic, preventing the hang when a broker is restarted.
Result
   The console will attempt to re-contact the broker periodically when a connection is lost.  Once the broker is available, the console will connect to it and resume normal operation.