Bug 472928 - Enqueue completion lost/not signalled if queue is deleted
Enqueue completion lost/not signalled if queue is deleted
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.0
All Linux
high Severity high
: 1.1
: ---
Assigned To: Kim van der Riet
Kim van der Riet
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-25 12:23 EST by Gordon Sim
Modified: 2009-02-04 10:36 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-04 10:36:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2008-11-25 12:23:19 EST
Run durable applications (I used txtest) and in the middle of testing, try and shutdown the broker (I used ctrl-c). It seems to hang waiting for message completion:

Thread 6 (Thread 1094846784 (LWP 27838)):
#0  0x0000003835e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x000000356016f2f6 in qpid::broker::Timer::run ()
#2  0x000000355fb7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x0000003835e062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000038356d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1111365952 (LWP 27839)):
#0  0x0000003835e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x000000356016f2f6 in qpid::broker::Timer::run ()
#2  0x000000355fb7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x0000003835e062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000038356d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1121855808 (LWP 27840)):
#0  0x0000003835e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x000000356016f2f6 in qpid::broker::Timer::run ()
#2  0x000000355fb7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x0000003835e062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000038356d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1132345664 (LWP 27841)):
#0  0x0000003835e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x000000356016f2f6 in qpid::broker::Timer::run ()
#2  0x000000355fb7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x0000003835e062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000038356d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1174305088 (LWP 27845)):
#0  0x0000003835e0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x000000356011e37e in qpid::broker::IncompleteMessageList::process ()
#2  0x000000356016796c in qpid::broker::SessionState::handleCommand ()
#3  0x000000356016a608 in qpid::broker::SessionState::handleIn ()
#4  0x000000355fb9586e in qpid::amqp_0_10::SessionHandler::handleIn ()
#5  0x00000035600fa011 in qpid::broker::ConnectionHandler::handle ()
#6  0x00000035600f3cb0 in qpid::broker::Connection::received ()
#7  0x00000035600c563f in qpid::amqp_0_10::Connection::decode ()
#8  0x000000355fbbe3f3 in qpid::sys::AsynchIOHandler::readbuff ()
#9  0x000000355fb7099a in boost::function2<void, qpid::sys::AsynchIO&, qpid::sys::AsynchIOBufferBase*, std::allocator<boost::function_base> >::operator() ()
#10 0x000000355fb6c905 in qpid::sys::posix::AsynchIO::readable ()
#11 0x000000355fbc10c9 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() ()
#12 0x000000355fbbf484 in qpid::sys::DispatchHandle::processEvent ()
#13 0x000000355fbbef3e in qpid::sys::Dispatcher::run ()
#14 0x000000355fb7311a in qpid::sys::AbsTime::AbsTime ()
#15 0x0000003835e062f7 in start_thread () from /lib64/libpthread.so.0
#16 0x00000038356d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 46986249193360 (LWP 27837)):
#0  0x0000003835e075b5 in pthread_join () from /lib64/libpthread.so.0
#1  0x000000355fb73322 in qpid::sys::Thread::join ()
#2  0x00000035600c72fd in qpid::broker::Broker::run ()
#3  0x0000000000406858 in qpid::log::Options::~Options ()
#4  0x0000000000405438 in __cxa_pure_virtual ()
#5  0x000000383561d8b4 in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000404eb9 in __cxa_pure_virtual ()
#7  0x00007fffd3fab978 in ?? ()
#8  0x0000000000000000 in ?? ()
#0  0x0000003835e075b5 in pthread_join () from /lib64/libpthread.so.0
Comment 1 Gordon Sim 2008-11-27 15:53:07 EST
On further investigation it appears that this is unrelated to shutdown, its just that one thread is hung waiting for enqueue completion notification that never arrives.

Can reproduce by running:

while ./run-tests -v -s ../specs/amqp.0-10-qpid-errata.xml tests_0-10.persistence.PersistenceTests.test_delete_queue_after_publish; do true; done

concurrently with some other tests that load the broker (I find this speeds up the occurence of the failure).

The python test will eventually time out and from that point a hung thread can be observed with pstack.
Comment 2 Kim van der Riet 2008-12-02 16:34:15 EST
This looks like a hang-over from the days before persistent messages used boost intrusive pointers. At this time, in order to prevent callbacks on deleted messages after shutdown, the journal would check its own shutdown status and prevent the callbacks if shutdown is in progress.

The python persistence test, if run while txtest is also running has a good chance of failing with this error because the additional disk activity increases the probability that there will be outstanding aio from the python test at the time it deletes the queue.
Comment 3 Kim van der Riet 2008-12-02 16:36:54 EST
Fixed in r.2908 by removing the check for stopped condition which blocks callbacks.

QA: The above test should reproduce the failure easily.
Comment 5 David Sommerseth 2008-12-11 12:51:42 EST
Ran the test routine found in comment #1 for 2-2.5 hour without any hangs.  After consulting with Gordon, we agreed to move this one to verified.

Packages used
rhm-0.4.2964-5.el5
qpidd-0.4.725317-2.el5
(+ a lot of other packages from the 0.4.725317-2 series)
Comment 7 errata-xmlrpc 2009-02-04 10:36:38 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html

Note You need to log in before you can comment on or make changes to this bug.