Bug 1001772

Summary: [AMQP 1.0] support for automatic reconnect
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matousek <pematous>
Severity: unspecified Docs Contact:
Priority: medium    
Version: DevelopmentCC: gsim, iboverma, jross, lzhaldyb, pematous
Target Milestone: 3.0Keywords: Improvement
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.22-30 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-20 13:50:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1010399    

Description Gordon Sim 2013-08-27 17:21:24 UTC
Description of problem:

On 0-10 the reconnect option can be set and then if the connection is lost the library will attempt to reconnect, re-establish sessions, senders and receiver and resend indoubt messages. The same functionality should be available over AMQP 1.0.

Comment 1 Gordon Sim 2013-08-28 15:01:23 UTC
Fixed upstream: https://svn.apache.org/r1518233 

(applying after https://bugzilla.redhat.com/show_bug.cgi?id=981636 and https://bugzilla.redhat.com/show_bug.cgi?id=967734 will make merge easier)

Comment 7 Petr Matousek 2013-11-28 15:55:01 UTC
Already connected consumer is not able to reconnect after broker restart.

1. service qpidd stop
2. $cppapi/drain -f --connection-options "{protocol:'amqp1.0', reconnect: True}" amq.direct
3. service qpidd start
4. $cppapi/spout amq.direct
5. message received

This is OK so far, the client was started when the broker was down and was able to connect after the broker was started

6. service qpidd restart
7. $cppapi/spout amq.direct
8. message NOT received, the consumer has not reconnect

Note: Producer (ie. spout) do not suffer from that (may reconnect multiple times)

Client's seems to be stuck in fetch method call:

Thread 2 (Thread 0xb77beb70 (LWP 23957)):
#0  0x008af416 in __kernel_vsyscall ()
#1  0x0043d5e6 in epoll_wait () from /lib/libc.so.6
#2  0x04fd50cc in qpid::sys::Poller::wait (this=0x856a580, timeout=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/sys/epoll/EpollPoller.cpp:566
#3  0x04fd58a3 in qpid::sys::Poller::run (this=0x856a580) at /usr/src/debug/qpid-0.22/cpp/src/qpid/sys/epoll/EpollPoller.cpp:518
#4  0x04fc9981 in qpid::sys::(anonymous namespace)::runRunnable (p=0x856a580) at /usr/src/debug/qpid-0.22/cpp/src/qpid/sys/posix/Thread.cpp:35
#5  0x004f9b39 in start_thread () from /lib/libpthread.so.0
#6  0x0043cd6e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb77c09d0 (LWP 23956)):
#0  0x008af416 in __kernel_vsyscall ()
#1  0x004fd794 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x0044c9f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x009685d1 in wait (this=0x856dc58, until=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/sys/posix/Condition.h:69
#4  wait (this=0x856dc58, until=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/sys/Monitor.h:45
#5  qpid::messaging::amqp::ConnectionContext::waitUntil (this=0x856dc58, until=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/amqp/ConnectionContext.cpp:435
#6  0x0096887c in qpid::messaging::amqp::ConnectionContext::waitUntil (this=0x856dc58, ssn=..., lnk=..., until=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/amqp/ConnectionContext.cpp:460
#7  0x00968b0b in qpid::messaging::amqp::ConnectionContext::get (this=0x856dc58, ssn=..., lnk=..., message=..., timeout=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/amqp/ConnectionContext.cpp:216
#8  0x0096a7ad in qpid::messaging::amqp::ConnectionContext::fetch (this=0x856dc58, ssn=..., lnk=..., message=..., timeout=...)
    at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/amqp/ConnectionContext.cpp:151
#9  0x00972195 in qpid::messaging::amqp::ReceiverHandle::fetch (this=0x856f7d0, message=..., timeout=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/amqp/ReceiverHandle.cpp:55
#10 0x009bd3ae in qpid::messaging::Receiver::fetch (this=0xbfc83490, message=..., timeout=...) at /usr/src/debug/qpid-0.22/cpp/src/qpid/messaging/Receiver.cpp:47
#11 0x0804d94d in main ()

Comment 9 Gordon Sim 2013-11-28 18:14:02 UTC
The issue here is that the receiver in question has no capacity set and the 1.0 path doesn't allocate credit in this case after failover (qpid-receive has non-zero capacity by default which is why it works and drain doesn't). I've committed a fix upstream: https://svn.apache.org/r1546415

Comment 13 Petr Matousek 2014-01-06 14:55:06 UTC
QE note: issue from comment 7 has been fixed

Comment 14 Petr Matousek 2014-01-15 16:56:11 UTC
Reconnect capability available and working for amqp1.0 c++ client.  Verified on rhel6.5 (x86_64, i386).

packages under test:
qpid-cpp-*-0.22-30.el6

-> VERIFIED