Bug 486314

Summary: qpidd - c++ client failover hangs on receiver / sender client when authentication is enabled
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: mick <mgoulish>
Status: CLOSED NOTABUG QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: high    
Version: 1.1CC: esammons, gsim
Target Milestone: 1.3Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-05 13:50:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the reproducer
none
qpidd.log for both nodes
none
Reproducer of the last hang (more complete) none

Description Frantisek Reznicek 2009-02-19 11:05:22 UTC
Description of problem:
qpid_test_mnode_cluster test triggered the issue, c++ client failover hangs in the middle  (either receiver of sender client).
Investigating further showed that lock mentioned below in Additional info section.

See details below or here http://pastebin.test.redhat.com/8054

Version-Release number of selected component (if applicable):
qpidd-0.4.744917-1.el5

How reproducible:
seldom ~10%

Steps to Reproduce:
1. run 2 qpidd on two different ports (on different hosts) ${SEL_QPIDD_PORT} ${SEL_QPIDD_PORT_ALT}
2. ./txjob -p ${SEL_QPIDD_PORT} --log-enable  ${clients_log_level} \
           --declare-queues --messages ${data_n} \
           --jobs ${data_m} --source ${q_src_name} --dest ${q_dest_name}
3. ./txjob -p ${SEL_QPIDD_PORT} --log-enable  ${clients_log_level} \
           --messages 1 --jobs 1 --source ${q_src_name} \
           --dest ${q_dest_name} --work-queue ${q_ctrl_name} --add-quit
4. ./sender -p ${SEL_QPIDD_PORT} --log-enable  ${clients_log_level} \
            --routing-key ${q_src_name} --send-eos 1 < ../test-data.in

here it sometimes hangs (following points summarizes the whole procedure)

5. ./txshift --work-queue ${q_ctrl_name} -p ${SEL_QPIDD_PORT} \
             --log-enable ${clients_log_level}

6. sleep 5...10

7. sleep X # wait for txshift finishes

8. ./receiver -p ${SEL_QPIDD_PORT_ALT} --log-enable  ${clients_log_level} \
              --queue ${q_dest_name} > test-data.out

9. diff test-data.out test-data.in to see whether all data are ok

Actual results:
Process hangs at point 4. or 8. sometimes passes

Expected results:
Whole process need to pass. (all steps)

Additional info:

[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# cat j0004.sh
./sender -p 35829 --log-enable  debug+ --routing-key txshift-source-h2-j2  --send-eos 1 < ../test-data.in

[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# cat j0004.sh.log
2009-feb-19 04:59:29 info Attempting to connect to localhost on 35829...
2009-feb-19 04:59:29 debug ConnectionImpl created for \x00-

2009-feb-19 04:59:29 info Connecting to tcp:localhost:35829
2009-feb-19 04:59:29 debug TCPConnector created for \x00-

2009-feb-19 04:59:29 debug RECV [42393 localhost:35829] INIT(0-10)
2009-feb-19 04:59:29 debug CyrusSasl::start(ANONYMOUS)
2009-feb-19 04:59:29 debug min_ssf: 0, max_ssf: 256
2009-feb-19 04:59:29 debug CyrusSasl::start(ANONYMOUS): selected ANONYMOUS response: 'anonymous.bos.redhat.com'
2009-feb-19 04:59:30 debug No security layer in place
2009-feb-19 04:59:30 debug SessionState::SessionState @: 0x1dc0a660
2009-feb-19 04:59:30 debug Known-brokers for connection: amqp:tcp:10.16.65.59:48767, amqp:tcp:10.16.64.64:49050, amqp:tcp:10.16.64.106:35829
2009-feb-19 04:59:31 info Connected to localhost on 35829
2009-feb-19 04:59:31 debug SessionState::SessionState @: 0x1dc0ad40
2009-feb-19 04:59:31 info Known-brokers update: amqp:tcp:10.16.65.59:48767, amqp:tcp:10.16.64.64:49050, amqp:tcp:10.16.64.106:35829
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# !pst
pstree -p 13629
slave_runner.sh(13629)--bash(13631)--sender(13632)--{sender}(13633)
                                                   -{sender}(13812)
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# pstack 13632
Thread 3 (Thread 0x42712940 (LWP 13633)):
#0  0x0000003ee4ed3498 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a357050d in qpid::sys::Poller::wait ()
#2  0x00000034a3571037 in qpid::sys::Poller::run ()
#3  0x00000034a3a6573b in qpid::client::TCPConnector::run ()
#4  0x00000034a3567e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x4125b940 (LWP 13812)):
#0  0x0000003ee560a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a3a777be in qpid::client::Dispatcher::run ()
#2  0x00000034a3a83e1f in qpid::client::FailoverListener::run ()
#3  0x00000034a3567e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#4  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b6604ae87c0 (LWP 13632)):
#0  0x0000003ee560a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a3a9ff4d in qpid::sys::Waitable::wait ()
#2  0x00000034a3a98618 in qpid::client::SessionImpl::close ()
#3  0x00000034a3a7e6a2 in qpid::client::FailoverManager::execute ()
#4  0x000000000040a10d in main ()
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# strace -p 13632
Process 13632 attached - interrupt to quit
futex(0x1dc0aa8c, FUTEX_WAIT_PRIVATE, 22627, NULL <unfinished ...>
Process 13632 detached
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# netstat -nlp | grep qpidd
tcp        0      0 0.0.0.0:35829               0.0.0.0:*                   LISTEN      10971/qpidd
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]# rpm -q qpidd
qpidd-0.4.744917-1.el5
[root@dell-pesc430-03 dell-pesc430-03.rhts.bos.redhat.com]#

Comment 1 Gordon Sim 2009-02-19 11:16:17 UTC
Hypothesis is that the broker is killed as the client attempts to close the session. In SessionImpl::waitFor the thread closing the session waits until the state is DETACHED. If the broker dies before responding to the detach control this state will never be reached.

SessionImpl::handleClosed() should wake up any thread waiting on a particular state.

Comment 2 Gordon Sim 2009-02-19 19:17:41 UTC
Above hypothesis is false; the state _would_ be changed on the connection being detected as closed. Also stack trace shows the IO thread is still there. I.e. it appears that the detach has been sent but no response has been received from the broker though connection has not died.

Comment 3 Frantisek Reznicek 2009-02-23 15:35:15 UTC
The issue reproduction ongoing... (gathering more test data on it)

Comment 4 Gordon Sim 2009-05-08 18:51:25 UTC
Did we ever rule this in or out?

Comment 6 Frantisek Reznicek 2010-01-05 15:20:52 UTC
I have retested the behavior on latest packages:

I can still see that sender hangs the clustered broker:
root@mrg-qe-11:~/bz486314# bash test.sh
cnt:0|data_n:138, data_m:260, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0000  msg|byteEnq/Deq:262/0|2092/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:3
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0000  msg|byteEnq/Deq:35881/0|2849863/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0000  msg|byteEnq/Deq:262/0|2092/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:16
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0000  msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0000  msg|byteEnq/Deq:35881/0|2849863/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0000  msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:1
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0000  msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0000  msg|byteEnq/Deq:35881/35880|2849863/2849860 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0000  msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0000  msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0000  msg|byteEnq/Deq:35881/35880|2849863/2849860 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0000  msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:0 end: err_cnt:0 ------------------
cnt:1|data_n:138, data_m:325, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0001  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0001  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0001  msg|byteEnq/Deq:327/0|2612/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:3
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0001  msg|byteEnq/Deq:44851/0|3562328/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0001  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0001  msg|byteEnq/Deq:327/0|2612/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:20
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0001  msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0001  msg|byteEnq/Deq:44851/0|3562328/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0001  msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0001  msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0001  msg|byteEnq/Deq:44851/44850|3562328/3562325 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0001  msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0001  msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0001  msg|byteEnq/Deq:44851/44850|3562328/3562325 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0001  msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:1 end: err_cnt:0 ------------------
cnt:2|data_n:138, data_m:271, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0002  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0002  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0002  msg|byteEnq/Deq:273/0|2180/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0002  msg|byteEnq/Deq:37399/0|2970434/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0002  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0002  msg|byteEnq/Deq:273/0|2180/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:18
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0002  msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0002  msg|byteEnq/Deq:37399/0|2970434/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0002  msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0002  msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0002  msg|byteEnq/Deq:37399/37398|2970434/2970431 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0002  msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0002  msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0002  msg|byteEnq/Deq:37399/37398|2970434/2970431 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0002  msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:2 end: err_cnt:0 ------------------
cnt:3|data_n:138, data_m:208, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0003  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0003  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0003  msg|byteEnq/Deq:210/0|1676/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0003  msg|byteEnq/Deq:28705/0|2279891/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0003  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0003  msg|byteEnq/Deq:210/0|1676/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:13
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0003  msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0003  msg|byteEnq/Deq:28705/0|2279891/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0003  msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:1
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0003  msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0003  msg|byteEnq/Deq:28705/28704|2279891/2279888 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0003  msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0003  msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0003  msg|byteEnq/Deq:28705/28704|2279891/2279888 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0003  msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:3 end: err_cnt:0 ------------------
cnt:4|data_n:138, data_m:306, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0004  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0004  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0004  msg|byteEnq/Deq:308/0|2460/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:337
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0004  msg|byteEnq/Deq:42229/0|3354069/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0004  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0004  msg|byteEnq/Deq:308/0|2460/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:21
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0004  msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0004  msg|byteEnq/Deq:42229/0|3354069/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0004  msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0004  msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0004  msg|byteEnq/Deq:42229/42228|3354069/3354066 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0004  msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0004  msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0004  msg|byteEnq/Deq:42229/42228|3354069/3354066 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0004  msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:4 end: err_cnt:0 ------------------
cnt:5|data_n:138, data_m:250, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0005  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0005  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0005  msg|byteEnq/Deq:252/0|2012/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0005  msg|byteEnq/Deq:34501/0|2740253/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0005  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0005  msg|byteEnq/Deq:252/0|2012/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:17
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0005  msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0005  msg|byteEnq/Deq:34501/0|2740253/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0005  msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0005  msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0005  msg|byteEnq/Deq:34501/34500|2740253/2740250 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0005  msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0005  msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0005  msg|byteEnq/Deq:34501/34500|2740253/2740250 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0005  msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:5 end: err_cnt:0 ------------------
cnt:6|data_n:138, data_m:286, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:1
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0006  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0006  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0006  msg|byteEnq/Deq:288/0|2300/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:3
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0006  msg|byteEnq/Deq:39469/0|3134849/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0006  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0006  msg|byteEnq/Deq:288/0|2300/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:16
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0006  msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0006  msg|byteEnq/Deq:39469/0|3134849/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0006  msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0006  msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0006  msg|byteEnq/Deq:39469/39468|3134849/3134846 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0006  msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0006  msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0006  msg|byteEnq/Deq:39469/39468|3134849/3134846 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0006  msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:6 end: err_cnt:0 ------------------
cnt:7|data_n:138, data_m:319, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0007  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0007  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0007  msg|byteEnq/Deq:321/0|2564/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0007  msg|byteEnq/Deq:44023/0|3496562/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0007  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0007  msg|byteEnq/Deq:321/0|2564/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:21
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0007  msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0007  msg|byteEnq/Deq:44023/0|3496562/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0007  msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0007  msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0007  msg|byteEnq/Deq:44023/44022|3496562/3496559 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0007  msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1
q_src_0007  msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0007  msg|byteEnq/Deq:44023/44022|3496562/3496559 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0007  msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:7 end: err_cnt:0 ------------------
cnt:8|data_n:138, data_m:256, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:258/0|2060/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:35329/0|2806019/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:258/0|2060/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:16
q_src_0008  msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:35329/0|2806019/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:2
q_src_0008  msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:35329/35328|2806019/2806016 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0008  msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0008  msg|byteEnq/Deq:35329/35328|2806019/2806016 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0008  msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:8 end: err_cnt:0 ------------------
cnt:9|data_n:138, data_m:222, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:1
txjob2 ecode:0, dur:0
q_src_0009  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0009  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0009  msg|byteEnq/Deq:224/0|1788/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:2
q_src_0009  msg|byteEnq/Deq:30637/0|2433345/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0009  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0009  msg|byteEnq/Deq:224/0|1788/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift ecode:0, dur:13
q_src_0009  msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0009  msg|byteEnq/Deq:30637/0|2433345/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0009  msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1
receiver ecode:0, dur:1
q_src_0009  msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0009  msg|byteEnq/Deq:30637/30636|2433345/2433342 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0009  msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1
diff ecode:0, dur:0
q_src_0009  msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0009  msg|byteEnq/Deq:30637/30636|2433345/2433342 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0009  msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1
#loop:9 end: err_cnt:0 ------------------
cnt:10|data_n:138, data_m:233, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
q_src_0010  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_dest_0010  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
q_ctrl_0010  msg|byteEnq/Deq:235/0|1876/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender    







sshd,30261
  ├─sshd,28220
  │   └─bash,28222
  │       ├─bash,3763 test.sh
  │       │   └─python,4148 /usr/bin/qpid-config del queue q_ctrl_0000
  │       │       ├─{python},4151
  │       │       ├─{python},4152
  │       │       └─{python},4153
  │       └─bash,7949 test.sh
  │           └─sender,12214 --username guest --password guest -p 5672 --log-enable info+ --routing-key q_src_0010 ...
  │               ├─{sender},12215
  │               └─{sender},12216
  └─sshd,30995
      └─bash,30997
          └─pstree,12217 -ap 30261
root@mrg-qe-11:~/bz486314# strace -p 12214
Process 12214 attached - interrupt to quit
futex(0x87e2d0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
Process 12214 detached
root@mrg-qe-11:~/bz486314# pstack 12214
Thread 3 (Thread 0xb7f10b90 (LWP 12215)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x0031a376 in epoll_wait () from /lib/libc.so.6
#2  0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0
#3  0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0
#4  0x005318d4 in qpid::sys::Dispatcher::run ()
#5  0x0016e6b5 in qpid::client::TCPConnector::run ()
#6  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#7  0x0023373b in start_thread () from /lib/libpthread.so.0
#8  0x00319cfe in clone () from /lib/libc.so.6
Thread 2 (Thread 0xb750fb90 (LWP 12216)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0
#4  0x001825d7 in qpid::client::Dispatcher::run ()
#5  0x001b74f8 in qpid::client::SubscriptionManager::run ()
#6  0x00190c9b in qpid::client::FailoverListener::run ()
#7  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#8  0x0023373b in start_thread () from /lib/libpthread.so.0
#9  0x00319cfe in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb7f11720 (LWP 12214)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x0015067b in qpid::client::Bounds::expand ()
#4  0x001a5678 in qpid::client::SessionImpl::sendFrame ()
#5  0x001a57ec in qpid::client::SessionImpl::handleOut ()
#6  0x001a4c07 in qpid::client::SessionImpl::sendContent ()
#7  0x001ad2d6 in qpid::client::SessionImpl::sendCommand ()
#8  0x001ad637 in qpid::client::SessionImpl::send ()
#9  0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer
#10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send ()
#11 0x0019bdfb in qpid::client::MessageReplayTracker::send ()
#12 0x08051014 in Sender::execute ()
#13 0x0018ad5e in qpid::client::FailoverManager::execute ()
#14 0x08051371 in main ()
root@mrg-qe-11:~/bz486314# strace -p 12214
Process 12214 attached - interrupt to quit
futex(0x87e2d0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
Process 12214 detached
root@mrg-qe-11:~/bz486314# gdb
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
(gdb) attach 12214
Attaching to process 12214
Reading symbols from /root/bz486314/sender...done.
warning: .dynamic section for "/usr/lib/libsasl2.so.2" is not at the expected address
warning: difference appears to be caused by prelink, adjusting expectations
Reading symbols from /usr/lib/libqpidclient.so.0...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib/libqpidclient.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libqpidcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib/libqpidcommon.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib/libqpidcommon.so.0
Reading symbols from /usr/lib/libboost_program_options.so.2...done.
Loaded symbols for /usr/lib/libboost_program_options.so.2
Reading symbols from /lib/libuuid.so.1...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libboost_filesystem.so.2...done.
Loaded symbols for /usr/lib/libboost_filesystem.so.2
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libsasl2.so.2...done.
Loaded symbols for /usr/lib/libsasl2.so.2
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0xb7f11720 (LWP 12214)]
[New Thread 0xb750fb90 (LWP 12216)]
[New Thread 0xb7f10b90 (LWP 12215)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/sslconnector.so.debug...done.
done.
Loaded symbols for /usr/lib/qpid/client/sslconnector.so
Reading symbols from /usr/lib/libsslcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib/libsslcommon.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib/libsslcommon.so.0
Reading symbols from /usr/lib/libnss3.so...done.
Loaded symbols for /usr/lib/libnss3.so
Reading symbols from /usr/lib/libssl3.so...done.
Loaded symbols for /usr/lib/libssl3.so
Reading symbols from /usr/lib/libnspr4.so...done.
Loaded symbols for /usr/lib/libnspr4.so
Reading symbols from /usr/lib/libnssutil3.so...done.
Loaded symbols for /usr/lib/libnssutil3.so
Reading symbols from /usr/lib/libplc4.so...done.
Loaded symbols for /usr/lib/libplc4.so
Reading symbols from /usr/lib/libplds4.so...done.
Loaded symbols for /usr/lib/libplds4.so
Reading symbols from /usr/lib/qpid/client/rdmaconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/rdmaconnector.so.debug...done.
done.
Loaded symbols for /usr/lib/qpid/client/rdmaconnector.so
Reading symbols from /usr/lib/librdmawrap.so.0...Reading symbols from /usr/lib/debug/usr/lib/librdmawrap.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib/librdmawrap.so.0
Reading symbols from /usr/lib/librdmacm.so.1...done.
Loaded symbols for /usr/lib/librdmacm.so.1
Reading symbols from /usr/lib/libibverbs.so.1...done.
Loaded symbols for /usr/lib/libibverbs.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/sasl2/liblogin.so.2...done.
Loaded symbols for /usr/lib/sasl2/liblogin.so.2
Reading symbols from /usr/lib/sasl2/libanonymous.so.2...done.
Loaded symbols for /usr/lib/sasl2/libanonymous.so.2
Reading symbols from /usr/lib/sasl2/libsasldb.so.2...done.
Loaded symbols for /usr/lib/sasl2/libsasldb.so.2
Reading symbols from /usr/lib/sasl2/libplain.so.2...done.
Loaded symbols for /usr/lib/sasl2/libplain.so.2
0x00e0a410 in __kernel_vsyscall ()
(gdb) info threads
  3 Thread 0xb7f10b90 (LWP 12215)  0x00e0a410 in __kernel_vsyscall ()
  2 Thread 0xb750fb90 (LWP 12216)  0x00e0a410 in __kernel_vsyscall ()
* 1 Thread 0xb7f11720 (LWP 12214)  0x00e0a410 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 3 (Thread 0xb7f10b90 (LWP 12215)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x0031a376 in epoll_wait () from /lib/libc.so.6
#2  0x004e08e9 in qpid::sys::Poller::wait (this=0x87e2a78, timeout={nanosecs = 9223372036854775807})
    at qpid/sys/epoll/EpollPoller.cpp:439
#3  0x004e1916 in qpid::sys::Poller::run (this=0x87e2a78) at qpid/sys/epoll/EpollPoller.cpp:405
#4  0x005318d4 in qpid::sys::Dispatcher::run (this=0xb7f10318) at qpid/sys/Dispatcher.cpp:37
#5  0x0016e6b5 in qpid::client::TCPConnector::run (this=0x87e29b0) at qpid/client/Connector.cpp:408
#6  0x004d6871 in runRunnable (p=0x87e29bc) at qpid/sys/posix/Thread.cpp:35
#7  0x0023373b in start_thread () from /lib/libpthread.so.0
#8  0x00319cfe in clone () from /lib/libc.so.6

Thread 2 (Thread 0xb750fb90 (LWP 12216)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop (this=0x87e7c38, timeout=
      {nanosecs = 9223372036854775807}) at qpid/sys/posix/Condition.h:63
#4  0x001825d7 in qpid::client::Dispatcher::run (this=0x87e4b8c) at qpid/client/Dispatcher.cpp:72
#5  0x001b74f8 in qpid::client::SubscriptionManager::run (this=0x87e4b70) at qpid/client/SubscriptionManager.cpp:95
#6  0x00190c9b in qpid::client::FailoverListener::run (this=0x87e3978) at qpid/client/FailoverListener.cpp:70
#7  0x004d6871 in runRunnable (p=0x87e397c) at qpid/sys/posix/Thread.cpp:35
#8  0x0023373b in start_thread () from /lib/libpthread.so.0
#9  0x00319cfe in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb7f11720 (LWP 12214)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x0015067b in qpid::client::Bounds::expand (this=0x87e2cf0, sizeRequired=63, block=true)
    at qpid/sys/posix/Condition.h:63
#4  0x001a5678 in qpid::client::SessionImpl::sendFrame (this=0x89fcd48, frame=@0xbfa1c3d4, canBlock=true)
    at qpid/client/SessionImpl.cpp:498
#5  0x001a57ec in qpid::client::SessionImpl::handleOut (this=0x89fcd48, frame=@0xbfa1c3d4)
---Type <return> to continue, or q <return> to quit---
    at qpid/client/SessionImpl.cpp:482
#6  0x001a4c07 in qpid::client::SessionImpl::sendContent (this=0x89fcd48, content=@0xb6aabae8)
    at qpid/client/SessionImpl.cpp:403
#7  0x001ad2d6 in qpid::client::SessionImpl::sendCommand (this=0x89fcd48, command=@0xbfa1c628, content=0xb6aabae8)
    at qpid/client/SessionImpl.cpp:385
#8  0x001ad637 in qpid::client::SessionImpl::send (this=0x89fcd48, command=@0xbfa1c628, content=@0xb6aabae8)
    at qpid/client/SessionImpl.cpp:293
#9  0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer (this=0xbfa1cbf8, destination=@0xb6aabba0,
    acceptMode=1 '\001', acquireMode=0 '\0', content=@0xb6aabae8, sync=false)
    at gen/qpid/client/no_keyword/AsyncSession_0_10.cpp:58
#10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send (this=0xb6aabad0, tracker=@0xbfa1cbf8)
    at gen/qpid/client/AsyncSession_0_10.h:305
#11 0x0019bdfb in qpid::client::MessageReplayTracker::send (this=0xbfa1cbf8, message=@0xbfa1cc10, destination=@0xbfa1cbf4)
    at qpid/client/MessageReplayTracker.cpp:32
#12 0x08051014 in Sender::execute (this=0xbfa1cbf0, session=@0xbfa1ca54, isRetry=false) at sender.cpp:93
#13 0x0018ad5e in qpid::client::FailoverManager::execute (this=0xbfa1ccd0, c=@0xbfa1cbf0)
    at qpid/client/FailoverManager.cpp:49
#14 0x08051371 in main (argc=13, argv=0xbfa1ce24) at sender.cpp:108
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /root/bz486314/sender, process 12214
root@mrg-qe-11:~/bz486314# !netst
netstat -tlnp | grep qpidd
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      28480/qpidd

# <> CANNOT pstack/gdb qpidd process (Operation not permitted)


# from another session
root@mrg-qe-11:~# qpid-cluster
Traceback (most recent call last):
  File "/usr/bin/qpid-cluster", line 322, in ?
    if e.__repr__().find("connection aborted") > 0:
AttributeError: Timeout instance has no attribute '__repr__'
root@mrg-qe-11:~# pstack 12214
Thread 3 (Thread 0xb7f10b90 (LWP 12215)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x0031a376 in epoll_wait () from /lib/libc.so.6
#2  0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0
#3  0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0
#4  0x005318d4 in qpid::sys::Dispatcher::run ()
#5  0x0016e6b5 in qpid::client::TCPConnector::run ()
#6  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#7  0x0023373b in start_thread () from /lib/libpthread.so.0
#8  0x00319cfe in clone () from /lib/libc.so.6
Thread 2 (Thread 0xb750fb90 (LWP 12216)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0
#4  0x001825d7 in qpid::client::Dispatcher::run ()
#5  0x001b74f8 in qpid::client::SubscriptionManager::run ()
#6  0x00190c9b in qpid::client::FailoverListener::run ()
#7  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#8  0x0023373b in start_thread () from /lib/libpthread.so.0
#9  0x00319cfe in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb7f11720 (LWP 12214)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x0015067b in qpid::client::Bounds::expand ()
#4  0x001a5678 in qpid::client::SessionImpl::sendFrame ()
#5  0x001a57ec in qpid::client::SessionImpl::handleOut ()
#6  0x001a4c07 in qpid::client::SessionImpl::sendContent ()
#7  0x001ad2d6 in qpid::client::SessionImpl::sendCommand ()
#8  0x001ad637 in qpid::client::SessionImpl::send ()
#9  0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer
#10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send ()
#11 0x0019bdfb in qpid::client::MessageReplayTracker::send ()
#12 0x08051014 in Sender::execute ()
#13 0x0018ad5e in qpid::client::FailoverManager::execute ()
#14 0x08051371 in main ()
root@mrg-qe-11:~# pstack 12214
Thread 3 (Thread 0xb7f10b90 (LWP 12215)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x0031a376 in epoll_wait () from /lib/libc.so.6
#2  0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0
#3  0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0
#4  0x005318d4 in qpid::sys::Dispatcher::run ()
#5  0x0016e6b5 in qpid::client::TCPConnector::run ()
#6  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#7  0x0023373b in start_thread () from /lib/libpthread.so.0
#8  0x00319cfe in clone () from /lib/libc.so.6
Thread 2 (Thread 0xb750fb90 (LWP 12216)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0
#4  0x001825d7 in qpid::client::Dispatcher::run ()
#5  0x001b74f8 in qpid::client::SubscriptionManager::run ()
#6  0x00190c9b in qpid::client::FailoverListener::run ()
#7  0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0
#8  0x0023373b in start_thread () from /lib/libpthread.so.0
#9  0x00319cfe in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb7f11720 (LWP 12214)):
#0  0x00e0a410 in __kernel_vsyscall ()
#1  0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x0015067b in qpid::client::Bounds::expand ()
#4  0x001a5678 in qpid::client::SessionImpl::sendFrame ()
#5  0x001a57ec in qpid::client::SessionImpl::handleOut ()
#6  0x001a4c07 in qpid::client::SessionImpl::sendContent ()
#7  0x001ad2d6 in qpid::client::SessionImpl::sendCommand ()
#8  0x001ad637 in qpid::client::SessionImpl::send ()
#9  0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer
#10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send ()
#11 0x0019bdfb in qpid::client::MessageReplayTracker::send ()
#12 0x08051014 in Sender::execute ()
#13 0x0018ad5e in qpid::client::FailoverManager::execute ()
#14 0x08051371 in main ()

So broker is not responding and finally also other nodes become not responding:
root@mrg-qe-12:~/fcluster# qpid-stat -b
Failed: Timeout - Waiting for connection to be established with broker


root@mrg-qe-12:~/fcluster# cat /etc/qpidd.conf

cluster-mechanism=ANONYMOUS
#max-session-rate=10000
cluster-name=mrg_pepe_test_cluster_
cluster-cman=no
#cluster-url=amqp:tcp:192.168.157.9:5672,tcp:192.168.157.10:5672,tcp:192.168.157.11:5672,tcp:192.168.157.12:5672,tcp:192.168.157.12:5675
auth=yes
log-enable=info+
log-enable=debug+:cluster
log-to-file=/tmp/qpidd.log




testing packages:
root@mrg-qe-11:~# rpm -qa | grep -E '(qpid|ais)' | sort
openais-0.80.6-8.el5_4.1
openais-debuginfo-0.80.6-8.el5_4.1
openais-devel-0.80.6-8.el5_4.1
python-qpid-0.5.752581-4.el5
qpidc-0.5.752581-34.el5
qpidc-debuginfo-0.5.752581-34.el5
qpidc-devel-0.5.752581-34.el5
qpidc-rdma-0.5.752581-34.el5
qpidc-ssl-0.5.752581-34.el5
qpidd-0.5.752581-34.el5
qpidd-acl-0.5.752581-34.el5
qpidd-cluster-0.5.752581-34.el5
qpidd-devel-0.5.752581-34.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-34.el5
qpidd-ssl-0.5.752581-34.el5
qpidd-xml-0.5.752581-34.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5

Comment 7 Frantisek Reznicek 2010-01-05 15:27:10 UTC
Created attachment 381773 [details]
the reproducer

Attached shell script should be able to reproduce the problem.
Steps:
0] stop all daemons
1] configure openais and qpidd
2] start openais on node A and B
3] start qpidd on node A and then B
4] download txshift.cpp, txjob.cpp, receiver.cpp, sender.cp and *Options.h
   and compile txshift, txjob, receiver, sender
5] run attachment w/o params on one node - wait for hang

Comment 8 Frantisek Reznicek 2010-01-05 15:35:19 UTC
Created attachment 381777 [details]
qpidd.log for both nodes

The package contains qpidd.log for both nodes (both RHEL 5.4 i386)

These type of errors are expected, because the repro is not perfect and tries to delete non existing queue as clean-up:

channel error 13291670 on 192.168.157.11:28480-656(local): not
-found: Queue not found: q_dest_0009

Comment 9 mick 2010-02-08 15:58:38 UTC
Hi Frantisek -- I finally started trying to reproduce this, but your script requires a file called qc_list_objects.py that isn't in the svn tree.  Where can I get a copy of that?

Comment 10 Frantisek Reznicek 2010-06-17 08:13:11 UTC
Hi Mick, please find the clients stored here:
http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_common/clients/python/

I'm sorry for late reply, I did not get notified.

Comment 11 mick 2010-07-29 19:24:08 UTC
It looks like I let this one age long enough that it got fixed by other changes.

In recent code -- svn rev 980427 --  I used a 2-broker cluster ( 1 on each box ) and a script very closely modelled on Frantisek's original description ( I will include it below ).  I have now run the script 150 times with no hang.  

If the original bug was reproducible about 10% of the time, then the odds of this many successful consecutive runs happening by accident are about 1 in 10 million.

The only way that my script is significantly different from the steps-to-repro in the original report is that I tell the receiver how many messages to expect.  If I leave that argument out -- it hangs.  It may be that the original script left out that arg -- but the receiver nevertheless terminated *most* of the time.

In any case, it looks like this bug has died of old age.


Here is the script I used ( not including the starting of the 2-box cluster )


#! /bin/bash

###############################################
#  486314.sh
###############################################

export LD_LIBRARY_PATH=$TRUNK/qpid/cpp/src/.libs
QPIDD_SRC=$TRUNK/qpid/cpp/src
QPIDD=$QPIDD_SRC/.libs/qpidd

count=0
while [ $count -lt 500 ]
do
  echo "==========================================="
  echo "TEST $count"
  date
  echo "==========================================="

  ####################################################
  ## define queues
  ####################################################
  echo "define queues"
  qpid-config -a 127.0.0.1:5801 add queue mick_0
  qpid-config -a 127.0.0.1:5801 add queue mick_1
  qpid-config -a 127.0.0.1:5801 add queue mick_2
  sleep 5

  ####################################################
  ## Make transaction job
  ####################################################
  echo "Make transaction job"
  $QPIDD_SRC/tests/txjob -p 5801 --log-enable info+ --messages 100 --jobs 10 --source mick_1 --dest mick_2 --work-queue mick_0 --add-quit
  sleep 5


  ####################################################
  ## generate 1000 messages.
  ####################################################
  echo "generate 1000 messages."
  $QPIDD_SRC/tests/datagen --count 1000 | $QPIDD_SRC/tests/sender -p 5801 --routing-key mick_1
  sleep 5


  ####################################################
  ## Shift from queue 1 to 2
  ####################################################
  echo "Shift from queue 1 to 2"
  $QPIDD_SRC/tests/txshift --work-queue mick_0  -p 5801  --log-enable info+


  ####################################################
  ## Receive the messages.
  ####################################################
  echo "Receive the messages."
  $QPIDD_SRC/tests/receiver --messages 1000 -p 5801 --log-enable info+  --queue mick_2  > test-data.out


  ####################################################
  ## Count the messages.
  ####################################################
  echo "Count the messages."
  wc -l test-data.out


  ####################################################
  ## undefine queues
  ####################################################
  echo "undefine queues"
  qpid-config -a 127.0.0.1:5801 del queue mick_0
  qpid-config -a 127.0.0.1:5801 del queue mick_1
  qpid-config -a 127.0.0.1:5801 del queue mick_2
  sleep 5


  count=$(( $count + 1 ))
done

Comment 12 Frantisek Reznicek 2010-08-03 11:58:52 UTC
The issue is currently under test for few hours w/o any issue.
I tend to CLOSED/NOTABUG. Final feedback soon (test need to run for about a day to be on the safe side of probability street).

Comment 13 Frantisek Reznicek 2010-08-03 15:18:21 UTC
Ok, I got the point here.

I retested my reproducer with authentication off and everything was perfect.

Once I switched on the authentication I triggered the issue again, currently during txshift run:
[root@mrg-qe-01 bz486314]# ./run.sh 1 2 10
aisexec dead but subsys locked
Starting OpenAIS daemon (aisexec):                         [  OK  ]
Stopping OpenAIS daemon (aisexec):                         [  OK  ]
Starting OpenAIS daemon (aisexec):                         [  OK  ]
10-08-03_11:08:53 slave():compiling clients from sources...
10-08-03_11:08:56 .done (err_cnt:0)
10-08-03_11:08:56 .broker[s] start
10-08-03_11:09:06 ..err_cnt:0, pids:12429,12443
10-08-03_11:09:06 .broker[s] check
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      12429/qpidd
tcp        0      0 0.0.0.0:5673                0.0.0.0:*                   LISTEN      12443/qpidd
10-08-03_11:09:06 ..err_cnt:0, ecode:00
cnt:0/10|data_n:148, data_m:99, ps|wc-l:148 test-data.in.tmp, final_in|wc-l:14652 test-data.in.0000
txjob1 ecode:0, dur:0
txjob2 ecode:0, dur:0
  q_ctrl_0000  msg|byteEnq/Deq:101/0|804/0 unacked:0 consumer_cnt:0 binding_cnt:1
  q_dest_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
  q_src_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
sender ecode:0, dur:0
  q_ctrl_0000  msg|byteEnq/Deq:101/0|804/0 unacked:0 consumer_cnt:0 binding_cnt:1
  q_dest_0000  msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1
  q_src_0000  msg|byteEnq/Deq:14652/0|1160280/0 unacked:0 consumer_cnt:0 binding_cnt:1
txshift
<hanged, for minutes, when expected time to complete is <20 seconds>

snapping process state shows (the same as above):

(gdb)   3 Thread 0xb7ef9b90 (LWP 12695)  0x00d7e410 in __kernel_vsyscall ()
  2 Thread 0xb72ffb90 (LWP 12696)  0x00d7e410 in __kernel_vsyscall ()
* 1 Thread 0xb7efa940 (LWP 12694)  0x00d7e410 in __kernel_vsyscall ()
(gdb)
Thread 3 (Thread 0xb7ef9b90 (LWP 12695)):
#0  0x00d7e410 in __kernel_vsyscall ()
#1  0x001e2486 in epoll_wait () from /lib/libc.so.6
#2  0x00562c2a in qpid::sys::Poller::wait (this=0x90bab18, timeout=...)
    at qpid/sys/epoll/EpollPoller.cpp:563
#3  0x00563853 in qpid::sys::Poller::run (this=0x90bab18)
    at qpid/sys/epoll/EpollPoller.cpp:515
#4  0x00558d01 in qpid::sys::(anonymous namespace)::runRunnable (p=0x90bab18)
    at qpid/sys/posix/Thread.cpp:35
#5  0x002c1832 in start_thread () from /lib/libpthread.so.0
#6  0x001e1e0e in clone () from /lib/libc.so.6

Thread 2 (Thread 0xb72ffb90 (LWP 12696)):
#0  0x00d7e410 in __kernel_vsyscall ()
#1  0x002c5bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x001ee4dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00cea88c in pop (this=0xb73008d0, timeout=...)
    at ../include/qpid/sys/posix/Condition.h:63
#4  qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop
    (this=0xb73008d0, timeout=...) at qpid/sys/BlockingQueue.h:71
#5  0x00ce6e87 in qpid::client::Dispatcher::run (this=0xb7300a44)
    at qpid/client/Dispatcher.cpp:80
#6  0x00d1e008 in qpid::client::SubscriptionManagerImpl::run (this=0xb7300a20)
    at qpid/client/SubscriptionManagerImpl.cpp:98
#7  0x00d1c034 in qpid::client::SubscriptionManager::run (this=0xb7300548)
    at qpid/client/SubscriptionManager.cpp:60
#8  0x00ceb5d0 in qpid::client::FailoverListener::run (this=0xb7300508)
    at qpid/client/FailoverListener.cpp:52
#9  0x00558d01 in qpid::sys::(anonymous namespace)::runRunnable (p=0xb730050c)
    at qpid/sys/posix/Thread.cpp:35
#10 0x002c1832 in start_thread () from /lib/libpthread.so.0
#11 0x001e1e0e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb7efa940 (LWP 12694)):
#0  0x00d7e410 in __kernel_vsyscall ()
#1  0x002c5bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x001ee4dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00cea88c in pop (this=0xb7301310, timeout=...)
    at ../include/qpid/sys/posix/Condition.h:63
#4  qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop
    (this=0xb7301310, timeout=...) at qpid/sys/BlockingQueue.h:71
#5  0x00ce6e87 in qpid::client::Dispatcher::run (this=0xb73013bc)
    at qpid/client/Dispatcher.cpp:80
#6  0x00d1e008 in qpid::client::SubscriptionManagerImpl::run (this=0xb7301398)
    at qpid/client/SubscriptionManagerImpl.cpp:98
#7  0x00d1c034 in qpid::client::SubscriptionManager::run (this=0xbfc1b274)
    at qpid/client/SubscriptionManager.cpp:60
#8  0x0805a730 in qpid::tests::Worker::execute(qpid::client::AsyncSession_0_10&, bool) ()
#9  0x00cf2268 in qpid::client::FailoverManager::execute (this=0xbfc1b664,
    c=...) at qpid/client/FailoverManager.cpp:49
#10 0x08059d71 in qpid::tests::Worker::run() ()
#11 0x08050ede in main ()
(gdb) Detaching from program: /root/bz/bz486314/txshift, process 12694
(gdb) quit


I may miss something fundamental here, but at the moment I need to return it to ASSIGNED.

Comment 14 Frantisek Reznicek 2010-08-03 15:20:42 UTC
Created attachment 436297 [details]
Reproducer of the last hang (more complete)

On RHEL5 run in clustered env. as
./run.sh 1 2 <nr-of-loops>  # authentication done
./run.sh 0 2 <nr-of-loops>  # no authentication done

on RHEL4 use:

./run.sh 1 1 <nr-of-loops>  # authentication done
./run.sh 0 1 <nr-of-loops>  # no authentication done

Comment 15 Frantisek Reznicek 2010-08-04 13:52:35 UTC
The issue reproduced on mrg35 where -11 source packages compiled and modified test repro ran. Data in /usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/tests/.

Tu run use:
LD_LIBRARY_PATH=/usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/.libs ./run.sh 1 1


Reproduced in non-clustering environment with txshift hanging here:
(*): Shared library is missing debugging information.
(gdb)   2 Thread 0x42405940 (LWP 29689)  0x00000032078d4108 in epoll_wait ()
   from /lib64/libc.so.6
* 1 Thread 0x2b4283852f70 (LWP 29668)  0x000000320840aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb)
Thread 2 (Thread 0x42405940 (LWP 29689)):
#0  0x00000032078d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b42832c4121 in qpid::sys::Poller::wait (this=0x1439c340,
    timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563
#2  0x00002b42832c4bb7 in qpid::sys::Poller::run (this=0x1439c340)
    at qpid/sys/epoll/EpollPoller.cpp:515
#3  0x00002b42832bae3a in qpid::sys::(anonymous namespace)::runRunnable (p=0x6)
    at qpid/sys/posix/Thread.cpp:35
#4  0x000000320840673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000032078d3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b4283852f70 (LWP 29668)):
#0  0x000000320840aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00002b4282f2e54b in pop (this=0x1439ea30)
    at ../include/qpid/sys/posix/Condition.h:63
#2  qpid::client::Dispatcher::run (this=0x1439ea30)
    at qpid/client/Dispatcher.cpp:80
#3  0x000000000040fca8 in qpid::tests::Worker::execute (this=0x7fffbf1261d0,
    session=..., isRetry=<value optimized out>) at txshift.cpp:160
#4  0x00002b4282f38648 in qpid::client::FailoverManager::execute (
    this=0x7fffbf1260c0, c=...) at qpid/client/FailoverManager.cpp:49
#5  0x000000000040b059 in main (argc=13, argv=<value optimized out>)
    at txshift.cpp:141
(gdb) Detaching from program: /usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/tests/.libs/lt-txshift, process 29668
(gdb) quit


Neither qpidd log nor hanging txshift doesn't show anything abnormal.

Comment 16 Frantisek Reznicek 2010-08-04 14:30:09 UTC
I revisited the hang backtraces and noticed that I overlooked the fact that recently reproduced hang in txshift (comment 13) is different from the original one (in sender).

The effort is now in making sure we understand what is blocking the txshift, once clear then I'll rerun and update BZ.

Comment 17 Frantisek Reznicek 2010-08-04 14:44:00 UTC
ok, my bad. The comment 13 hang is dummy caused by changing the reproducer.

Currently reproducing with authentication.
Feedback soon.

Comment 18 Frantisek Reznicek 2010-08-05 06:39:31 UTC
My testing over night proved that comments 13 - 17 was blind branch caused by issue in the reproducer.
I'm not able to trigger the issue anymore after (4 x 1100 runs on different machines).

I tend to close the bug as CLOSED/NOTABUG.

Mick, feel free to kick it there if you agree.

Comment 19 mick 2010-08-05 13:50:42 UTC
Frantisek and I together have run 1700 iterations of his script with no repro.  Deadbug.