Bug 614132

Summary: qpid python client [high level API] failover not functional (reconnect from failover exchange)
Product: Red Hat Enterprise Linux 6 Reporter: Rafael H. Schloming <rafaels>
Component: python-qpidAssignee: Rafael H. Schloming <rafaels>
Status: CLOSED CURRENTRELEASE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: bche, freznice, jneedle, syeghiay, tross
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: python-qpid-0.7.946106-9.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 614054 Environment:
Last Closed: 2010-11-10 21:23:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 614054    
Bug Blocks:    

Description Rafael H. Schloming 2010-07-13 18:01:42 UTC
+++ This bug was initially created as a clone of Bug #614054 +++

Description of problem:

There seems to be the issue in the high-level API failover/reconnect mechanism (from failover exchange)

The testing scenario:
1] three machines C and A and B
2] C issues commands
3] A and B are two physical machines in (qpidd) cluster

The python clients can be downloaded from bug 613912 attachment
( https://bugzilla.redhat.com/attachment.cgi?id=431389 )

The drain and spout clients are set to enable reconnect & define reconnect limit and interval together with reconnection from failover exchange:
  qpid.messaging.util.auto_fetch_reconnect_urls(conn);

It was seen that reconnection from remote machine/host to current machine suffers from above listed defect, but if machine C and B are the same (i.e. you are reconnecting from current machine to another remote one works as expected)


Version-Release number of selected component (if applicable):
python-qpid-0.7.946106-4.el5

How reproducible:
100%

Steps to Reproduce:
1. set-up openais and qpidd, create msg.in with some text data
2. start qpidd on A and B, verify whether they are in cluster
3. C issues: 
   cat ./msgs.in | ../spout.py --address "addr1; \
       {create: sender, delete: receiver}"  --broker B \
       --reconnect --reconnect-interval 3 --reconnect-limit 10 \
       --reconnect-urls-from-failover-exchange
4. C issues: 
   ../drain.py --address "addr1; {create: sender, delete: receiver}" \
   --broker B  --reconnect --reconnect-interval 2 --reconnect-limit 5 \
   --reconnect-urls-from-failover-exchange --print "%(C)s"> msgs.out

5. while running 4] SIGINT the broker on B
6. expect that client from C will reconnect to A, which is not the case
  
Actual results:
Qpid python high-level API reconnection from failover exchange does not work.

Expected results:
Qpid python high-level API reconnection from failover exchange should work

Additional info:

# in this case C (dhcp-26-251) is F8 machine 
# with packages which match the A and B ( dhcp-26-227 and dhcp-27-226)
[freznice@dhcp-26-251 test]$ cat ./msgs.in | ../spout.py --address "addr1; {create: sender, delete: receiver}"   --broker dhcp-27-226 --reconnect --reconnect-interval 3 --reconnect-limit 10   --reconnect-urls-from-failover-exchange
2010-07-13 16:55:42,349 WARNING set reconnect_urls for conn <qpid.messaging.endpoints.Connection instance at 0x27657a0>: [u'10.34.26.227:5672,tcp', u'10.34.27.226:5672,tcp']
Exception in thread auto-fetch-reconnect-urls:
Traceback (most recent call last):
  File "/usr/lib64/python2.5/threading.py", line 460, in __bootstrap
    self.run()
  File "/usr/lib64/python2.5/threading.py", line 440, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/util.py", line 36, in main
    msg = rcv.fetch()
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 1002, in fetch
    self.check_closed()
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 963, in check_closed
    raise LinkClosed()
LinkClosed: None

[freznice@dhcp-26-251 test]$ echo $?
0
[freznice@dhcp-26-251 test]$ ../qc_list_objects.py --broker dhcp-26-227 -qv --queue-filter=addr                                addr1  durable msg|byteEnq/Deq:2000/0|104935/0 unacked:0 consumer_cnt:0 binding_cnt:1
[freznice@dhcp-26-251 test]$ ../drain.py --address "addr1; {create: sender, delete: receiver}" --broker dhcp-27-226   --reconnect --reconnect-interval 2 --reconnect-limit 5   --reconnect-urls-from-failover-exchange --print "%(C)s"> msgs.out
2010-07-13 16:56:19,382 WARNING set reconnect_urls for conn <qpid.messaging.endpoints.Connection instance at 0x11039e0>: [u'10.34.26.227:5672,tcp', u'10.34.27.226:5672,tcp']
2010-07-13 16:56:21,073 WARNING recoverable error[attempt 0]: connection aborted
2010-07-13 16:56:21,073 WARNING sleeping 2 seconds
2010-07-13 16:56:23,073 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:23,075 WARNING recoverable error[attempt 1]: (111, 'Connection refused')
2010-07-13 16:56:23,075 WARNING sleeping 2 seconds
2010-07-13 16:56:25,075 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:25,077 WARNING recoverable error[attempt 2]: (111, 'Connection refused')
2010-07-13 16:56:25,077 WARNING sleeping 2 seconds
2010-07-13 16:56:27,077 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:27,079 WARNING recoverable error[attempt 3]: (111, 'Connection refused')
2010-07-13 16:56:27,079 WARNING sleeping 2 seconds
2010-07-13 16:56:29,079 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:29,081 WARNING recoverable error[attempt 4]: (111, 'Connection refused')
2010-07-13 16:56:29,081 WARNING sleeping 2 seconds
2010-07-13 16:56:31,081 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:31,083 WARNING recoverable error[attempt 5]: (111, 'Connection refused')
2010-07-13 16:56:31,083 WARNING sleeping 2 seconds
2010-07-13 16:56:33,083 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:33,085 WARNING trying: dhcp-27-226:5672
2010-07-13 16:56:33,086 WARNING trying: dhcp-27-226:5672
Traceback (most recent call last):
  File "../drain.py", line 114, in <module>
    msg = rcv.fetch(timeout=timeout)
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 1005, in fetch
    self._ecwait(lambda: not self.draining)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
    result = self._ewait(lambda: self.closed or predicate(), timeout)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 957, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait
    self.check_error()
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 187, in check_error
    raise self.error
qpid.messaging.exceptions.ConnectError: (111, 'Connection refused')
2010-07-13 16:56:33,089 WARNING trying: dhcp-27-226:5672
Exception in thread auto-fetch-reconnect-urls:
Traceback (most recent call last):
  File "/usr/lib64/python2.5/threading.py", line 460, in __bootstrap
    self.run()
  File "/usr/lib64/python2.5/threading.py", line 440, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/util.py", line 36, in main
    msg = rcv.fetch()
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 1000, in fetch
    msg = self.session._get(self, timeout=timeout)
  File "<string>", line 6, in _get
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 633, in _get
    timeout):
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait
    self.check_error()
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 187, in check_error
    raise self.error
ConnectError: (111, 'Connection refused')

[freznice@dhcp-26-251 test]$ ../qc_list_objects.py --broker dhcp-26-227 -qv --queue-filter=addr                                addr1  durable msg|byteEnq/Deq:2000/395|104935/20812 unacked:0 consumer_cnt:0 binding_cnt:1
[freznice@dhcp-26-251 test]$ ../drain.py --address "addr1; {create: sender, delete: receiver}" --broker dhcp-26-227   --reconnect --reconnect-interval 2 --reconnect-limit 5   --reconnect-urls-from-failover-exchange --print "%(C)s"> msgs.out
2010-07-13 16:57:08,881 WARNING set reconnect_urls for conn <qpid.messaging.endpoints.Connection instance at 0x128d9e0>: [u'10.34.26.227:5672,tcp']
Exception in thread auto-fetch-reconnect-urls:
Traceback (most recent call last):
  File "/usr/lib64/python2.5/threading.py", line 460, in __bootstrap
    self.run()
  File "/usr/lib64/python2.5/threading.py", line 440, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.5/site-packages/qpid/messaging/util.py", line 36, in main
    msg = rcv.fetch()
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 1002, in fetch
    self.check_closed()
  File "/usr/lib/python2.5/site-packages/qpid/messaging/endpoints.py", line 963, in check_closed
    raise LinkClosed()
LinkClosed: None

[freznice@dhcp-26-251 test]$ ../qc_list_objects.py --broker dhcp-26-227 -qv --queue-filter=addr                                [freznice@dhcp-26-251 test]$ rpm -q python-qpid
python-qpid-0.7.946106-4.fc8

# retested once more in pure rhel5 environmanet which proved above issue:
[root@dhcp-27-226 test]# cat ./msgs.in | ../spout --address "addr1; {create: sender, delete: receiver}"   --broker dhcp-26-227 --reconnect --reconnect-interval 3 --reconnect-limit 10   --reconnect-urls-from-failover-exchange
[root@dhcp-27-226 test]# ../drain.py --address "addr1; {create: sender, delete: receiver}" --broker dhcp-26-227   --reconnect --reconnect-interval 2 --reconnect-limit 5   --reconnect-urls-from-failover-exchange --print "%(C)s"> msgs.out
2010-07-13 17:21:06,180 WARNING set reconnect_urls for conn <qpid.messaging.endpoints.Connection instance at 0x2b98e960d6c8>: [u'10.34.26.227:5672,tcp', u'10.34.27.226:5672,tcp']
2010-07-13 17:21:07,819 WARNING recoverable error[attempt 0]: connection aborted
2010-07-13 17:21:07,819 WARNING sleeping 2 seconds
2010-07-13 17:21:09,819 WARNING trying: dhcp-26-227:5672
2010-07-13 17:21:09,821 WARNING recoverable error[attempt 1]: (111, 'Connection refused')
2010-07-13 17:21:09,821 WARNING sleeping 2 seconds
2010-07-13 17:21:11,821 WARNING trying: dhcp-26-227:5672
2010-07-13 17:21:11,823 WARNING recoverable error[attempt 2]: (111, 'Connection refused')
2010-07-13 17:21:11,823 WARNING sleeping 2 seconds
2010-07-13 17:21:13,823 WARNING trying: dhcp-26-227:5672
2010-07-13 17:21:13,825 WARNING recoverable error[attempt 3]: (111, 'Connection refused')
2010-07-13 17:21:13,825 WARNING sleeping 2 seconds
2010-07-13 17:21:15,825 WARNING trying: dhcp-26-227:5672
2010-07-13 17:21:15,827 WARNING recoverable error[attempt 4]: (111, 'Connection refused')
2010-07-13 17:21:15,827 WARNING sleeping 2 seconds
2010-07-13 17:21:17,827 WARNING trying: dhcp-26-227:5672
2010-07-13 17:21:17,829 WARNING recoverable error[attempt 5]: (111, 'Connection refused')
2010-07-13 17:21:17,829 WARNING sleeping 2 seconds
2010-07-13 17:21:19,829 WARNING trying: dhcp-26-227:5672
... // thousands of messages like this
2010-07-13 17:21:21,275 WARNING trying: dhcp-26-227:5672
Traceback (most recent call last):
  File "../drain.py", line 119, in ?
    ssn.acknowledge()
  File "<string>", line 6, in acknowledge
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 678, in acknowledge
    self._ecwait(lambda: not [m for m in messages if m in self.acked])
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
    result = self._ewait(lambda: self.closed or predicate(), timeout)
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait
    self.check_error()
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 187, in check_error
    raise self.error
qpid.messaging.exceptions.ConnectError: (111, 'Connection refused')
Exception in thread auto-fetch-reconnect-urls:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/messaging/util.py", line 36, in main
    msg = rcv.fetch()
  File "<string>", line 6, in fetch
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 1000, in fetch
    msg = self.session._get(self, timeout=timeout)
  File "<string>", line 6, in _get
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 633, in _get
    timeout):
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait
    self.check_error()
  File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 187, in check_error
    raise self.error
ConnectError: (111, 'Connection refused')

2010-07-13 17:21:21,292 WARNING trying: dhcp-26-227:5672
[root@dhcp-27-226 test]# ls
ecode.ecode     msgs.in          test_api_tc.sh       test_reconnect.sh.log
e.exp           msgs.out         test_reconnect.sh    test_reconnect.sh.log_
functions.sh    run_connect.sh_  test_reconnect.sh~   test_reconnect.sh.log_fail
functions.sh.0  test_api.sh      test_reconnect.sh.0  test_reconnect.sh.log_fail2
[root@dhcp-27-226 test]# ls -lA ./msgs.* ; diff ./msgs.in msgs.out | head            -rw-rw-r-- 1 root root 106935 Jul 13 16:43 ./msgs.in                                 -rw-rw-r-- 1 root root   7614 Jul 13 17:21 ./msgs.out
138,2000d137
< 7vVEkJVUt0B70dOtGKb0F4RRwQ04gmRsKUwGTnm4aExIuBBy1Te402rSYjHlukd9YccDCRbjqtF2GNJ52
< jELZTjTpCXPkbiIwW1fo6WL4CobgrjVB3gxygesCEJsF7olh5rxpr60
< 3JDMOnoMUUoP18W9nECW
< wlOO9BvU7kFkNjAPxBAftHyeqEkSzoR5
< QZYMT3TXOmiX
< MhlQsoLHFAzi3EdIzglObwNFTUaEdsDoYjNMdQhRmfTaAXnN5nQDTInu6ESSEDLVNG4WVGuo6RD4WBINXsSn8qv3rLFcG5qA
< GOECFMlTM1kqfW1fDzgMnUiTrFAvHLOYUw373zLduHr115ipo0Llzqy0A6wySKyLLvRDIu8OjImh38VkxbHJxPoYKk
< XZKLihWjEQbuKs2NPQUet6eFqA89E6xY9QCIE8z6NfX7i9lJEtaorcReO0EbkdvrUIvP8kOwXo047C
< k0oLfefQ0Fk0mqBz

Comment 2 releng-rhel@redhat.com 2010-11-10 21:23:30 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.