Bug 489351 - Possible issue with FailoverManager - segfaults registered
Summary: Possible issue with FailoverManager - segfaults registered
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.1.1
: ---
Assignee: mick
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
: 489346 (view as bug list)
Depends On: 485682
Blocks: 478874
TreeView+ depends on / blocked
 
Reported: 2009-03-09 16:36 UTC by David Sommerseth
Modified: 2016-05-22 23:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-28 18:52:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Client which fails ... FailOverManager Timed Sender (7.48 KB, text/x-c++src)
2009-03-09 16:39 UTC, David Sommerseth
no flags Details

Description David Sommerseth 2009-03-09 16:36:44 UTC
Using the FailoverManager, the client often segfaults when a broker in a cluster is killed.

** Preparation
- Configure and start openais

- Start 4 brokers which creates a cluster:
$ rm -rf /var/lib/qpidd/qpid1/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd1.log --data-dir /var/lib/qpidd/qpid1 --port 5672 --auth 0
$ rm -rf /var/lib/qpidd/qpid2/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd2.log --data-dir /var/lib/qpidd/qpid2 --port 9102 --auth 0
$ rm -rf /var/lib/qpidd/qpid3/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd3.log --data-dir /var/lib/qpidd/qpid3 --port 9103 --auth 0
rm -rf /var/lib/qpidd/qpid4/ && qpidd --worker-threads 10 --log-to-stderr 0 --daemon --cluster-name=rocks --log-to-file /var/log/qpidd4.log --data-dir /var/lib/qpidd/qpid4 --port 9104 --auth 0

- Start the fom_timed_sender client (will be attached soon)

- Kill qpid process 1 (qpid1)
- Do this loop until the client crashes:
  - Find out which broker the client reconnected to and kill that one too
  - Find out which broker the client reconnected to and kill that one too
  - Start up 2 of the killed qpid processes, using the startup lines above


On successful reproduction of this failure you will get this backtrace from the client:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x530c6940 (LWP 15568)]
0x0000003919e7b475 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003919e7b475 in memcpy () from /lib64/libc.so.6
#1  0x000000303af5956c in qpid::sys::Socket::connect () from /usr/lib64/libqpidcommon.so.0
#2  0x000000303b46657c in qpid::client::TCPConnector::connect () from /usr/lib64/libqpidclient.so.0
#3  0x000000303b45b5e9 in qpid::client::ConnectionImpl::open () from /usr/lib64/libqpidclient.so.0
#4  0x000000303b44d745 in qpid::client::Connection::open () from /usr/lib64/libqpidclient.so.0
#5  0x000000303b47d082 in qpid::client::FailoverManager::attempt () from /usr/lib64/libqpidclient.so.0
#6  0x000000303b47da7d in qpid::client::FailoverManager::attempt () from /usr/lib64/libqpidclient.so.0
#7  0x000000303b47dce4 in qpid::client::FailoverManager::connect () from /usr/lib64/libqpidclient.so.0
#8  0x000000303b47e5f8 in qpid::client::FailoverManager::execute () from /usr/lib64/libqpidclient.so.0
#9  0x000000000040c558 in child_process (thrargs=<value optimized out>) at sender.cpp:160
#10 0x000000391a606367 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003919ed30ad in clone () from /lib64/libc.so.6


At the moment, it is unclear if the error is due to a bug in the fom_timed_sender.cpp client, or if it is an issue with the FailoverManager interface.

The intention of the fom_timed_sender.cpp client is to start up X threads (controlled by an argument) which sends continuously messages with just a time stamp to random queues (number of queues available are controlled via an argument).  The future consumer will do almost the same, but will be pulling down all available messages and track the time used for a message delivery.

This tester being prepared will be used to test and hopefully confirm bug #478874

Comment 1 David Sommerseth 2009-03-09 16:37:54 UTC
*** Bug 489346 has been marked as a duplicate of this bug. ***

Comment 2 David Sommerseth 2009-03-09 16:39:12 UTC
Created attachment 334554 [details]
Client which fails ... FailOverManager Timed Sender

Comment 3 Andrew Stitcher 2009-03-09 21:26:21 UTC
I have fixed a bug which I believe could have caused this problem (BZ485682)

Comment 4 Gordon Sim 2009-03-10 09:31:03 UTC
As above, this appears to have been fixed by r751844.

Comment 5 David Sommerseth 2009-03-10 13:00:39 UTC
Verified fixed in SVN r751844.  After this patch I could kill brokers almost indefinitely without triggering this bug.  Considered fixed.

Comment 6 Justin Ross 2011-06-28 18:52:26 UTC
Fixed and verified; closing.


Note You need to log in before you can comment on or make changes to this bug.