Bug 486314
Summary: | qpidd - c++ client failover hangs on receiver / sender client when authentication is enabled | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||||||
Component: | qpid-cpp | Assignee: | mick <mgoulish> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Frantisek Reznicek <freznice> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 1.1 | CC: | esammons, gsim | ||||||||
Target Milestone: | 1.3 | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-08-05 13:50:42 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Frantisek Reznicek
2009-02-19 11:05:22 UTC
Hypothesis is that the broker is killed as the client attempts to close the session. In SessionImpl::waitFor the thread closing the session waits until the state is DETACHED. If the broker dies before responding to the detach control this state will never be reached. SessionImpl::handleClosed() should wake up any thread waiting on a particular state. Above hypothesis is false; the state _would_ be changed on the connection being detected as closed. Also stack trace shows the IO thread is still there. I.e. it appears that the detach has been sent but no response has been received from the broker though connection has not died. The issue reproduction ongoing... (gathering more test data on it) Did we ever rule this in or out? I have retested the behavior on latest packages: I can still see that sender hangs the clustered broker: root@mrg-qe-11:~/bz486314# bash test.sh cnt:0|data_n:138, data_m:260, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0000 msg|byteEnq/Deq:262/0|2092/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:3 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:35881/0|2849863/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0000 msg|byteEnq/Deq:262/0|2092/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:16 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:35881/0|2849863/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0000 msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:1 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:35881/35880|2849863/2849860 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0000 msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:35881/35881|2849863/2849863 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:35881/35880|2849863/2849860 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0000 msg|byteEnq/Deq:262/262|2092/2092 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:0 end: err_cnt:0 ------------------ cnt:1|data_n:138, data_m:325, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0001 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0001 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0001 msg|byteEnq/Deq:327/0|2612/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:3 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0001 msg|byteEnq/Deq:44851/0|3562328/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0001 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0001 msg|byteEnq/Deq:327/0|2612/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:20 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0001 msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0001 msg|byteEnq/Deq:44851/0|3562328/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0001 msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0001 msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0001 msg|byteEnq/Deq:44851/44850|3562328/3562325 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0001 msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0001 msg|byteEnq/Deq:44851/44851|3562328/3562328 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0001 msg|byteEnq/Deq:44851/44850|3562328/3562325 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0001 msg|byteEnq/Deq:327/327|2612/2612 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:1 end: err_cnt:0 ------------------ cnt:2|data_n:138, data_m:271, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0002 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0002 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0002 msg|byteEnq/Deq:273/0|2180/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0002 msg|byteEnq/Deq:37399/0|2970434/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0002 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0002 msg|byteEnq/Deq:273/0|2180/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:18 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0002 msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0002 msg|byteEnq/Deq:37399/0|2970434/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0002 msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0002 msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0002 msg|byteEnq/Deq:37399/37398|2970434/2970431 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0002 msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0002 msg|byteEnq/Deq:37399/37399|2970434/2970434 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0002 msg|byteEnq/Deq:37399/37398|2970434/2970431 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0002 msg|byteEnq/Deq:273/273|2180/2180 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:2 end: err_cnt:0 ------------------ cnt:3|data_n:138, data_m:208, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0003 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0003 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0003 msg|byteEnq/Deq:210/0|1676/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0003 msg|byteEnq/Deq:28705/0|2279891/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0003 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0003 msg|byteEnq/Deq:210/0|1676/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:13 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0003 msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0003 msg|byteEnq/Deq:28705/0|2279891/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0003 msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:1 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0003 msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0003 msg|byteEnq/Deq:28705/28704|2279891/2279888 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0003 msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0003 msg|byteEnq/Deq:28705/28705|2279891/2279891 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0003 msg|byteEnq/Deq:28705/28704|2279891/2279888 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0003 msg|byteEnq/Deq:210/210|1676/1676 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:3 end: err_cnt:0 ------------------ cnt:4|data_n:138, data_m:306, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0004 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0004 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0004 msg|byteEnq/Deq:308/0|2460/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:337 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0004 msg|byteEnq/Deq:42229/0|3354069/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0004 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0004 msg|byteEnq/Deq:308/0|2460/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:21 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0004 msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0004 msg|byteEnq/Deq:42229/0|3354069/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0004 msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0004 msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0004 msg|byteEnq/Deq:42229/42228|3354069/3354066 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0004 msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0004 msg|byteEnq/Deq:42229/42229|3354069/3354069 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0004 msg|byteEnq/Deq:42229/42228|3354069/3354066 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0004 msg|byteEnq/Deq:308/308|2460/2460 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:4 end: err_cnt:0 ------------------ cnt:5|data_n:138, data_m:250, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0005 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0005 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0005 msg|byteEnq/Deq:252/0|2012/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0005 msg|byteEnq/Deq:34501/0|2740253/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0005 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0005 msg|byteEnq/Deq:252/0|2012/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:17 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0005 msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0005 msg|byteEnq/Deq:34501/0|2740253/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0005 msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0005 msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0005 msg|byteEnq/Deq:34501/34500|2740253/2740250 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0005 msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0005 msg|byteEnq/Deq:34501/34501|2740253/2740253 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0005 msg|byteEnq/Deq:34501/34500|2740253/2740250 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0005 msg|byteEnq/Deq:252/252|2012/2012 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:5 end: err_cnt:0 ------------------ cnt:6|data_n:138, data_m:286, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:1 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0006 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0006 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0006 msg|byteEnq/Deq:288/0|2300/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:3 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0006 msg|byteEnq/Deq:39469/0|3134849/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0006 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0006 msg|byteEnq/Deq:288/0|2300/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:16 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0006 msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0006 msg|byteEnq/Deq:39469/0|3134849/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0006 msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0006 msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0006 msg|byteEnq/Deq:39469/39468|3134849/3134846 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0006 msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0006 msg|byteEnq/Deq:39469/39469|3134849/3134849 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0006 msg|byteEnq/Deq:39469/39468|3134849/3134846 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0006 msg|byteEnq/Deq:288/288|2300/2300 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:6 end: err_cnt:0 ------------------ cnt:7|data_n:138, data_m:319, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0007 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0007 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0007 msg|byteEnq/Deq:321/0|2564/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0007 msg|byteEnq/Deq:44023/0|3496562/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0007 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0007 msg|byteEnq/Deq:321/0|2564/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:21 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0007 msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0007 msg|byteEnq/Deq:44023/0|3496562/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0007 msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0007 msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0007 msg|byteEnq/Deq:44023/44022|3496562/3496559 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0007 msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:40297/40297|3200615/3200615 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:40297/11746|3200615/932927 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:294/294|2348/2348 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0007 msg|byteEnq/Deq:44023/44023|3496562/3496562 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0007 msg|byteEnq/Deq:44023/44022|3496562/3496559 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0007 msg|byteEnq/Deq:321/321|2564/2564 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:7 end: err_cnt:0 ------------------ cnt:8|data_n:138, data_m:256, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:258/0|2060/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:35329/0|2806019/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:258/0|2060/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:16 q_src_0008 msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:35329/0|2806019/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:2 q_src_0008 msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:35329/35328|2806019/2806016 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0008 msg|byteEnq/Deq:35329/35329|2806019/2806019 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0008 msg|byteEnq/Deq:35329/35328|2806019/2806016 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0008 msg|byteEnq/Deq:258/258|2060/2060 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:8 end: err_cnt:0 ------------------ cnt:9|data_n:138, data_m:222, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:1 txjob2 ecode:0, dur:0 q_src_0009 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0009 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0009 msg|byteEnq/Deq:224/0|1788/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:2 q_src_0009 msg|byteEnq/Deq:30637/0|2433345/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0009 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0009 msg|byteEnq/Deq:224/0|1788/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift ecode:0, dur:13 q_src_0009 msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0009 msg|byteEnq/Deq:30637/0|2433345/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0009 msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1 receiver ecode:0, dur:1 q_src_0009 msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0009 msg|byteEnq/Deq:30637/30636|2433345/2433342 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0009 msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1 diff ecode:0, dur:0 q_src_0009 msg|byteEnq/Deq:30637/30637|2433345/2433345 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0009 msg|byteEnq/Deq:30637/30636|2433345/2433342 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0009 msg|byteEnq/Deq:224/224|1788/1788 unacked:0 consumer_cnt:0 binding_cnt:1 #loop:9 end: err_cnt:0 ------------------ cnt:10|data_n:138, data_m:233, ps|wc-l:138 test-data.in.tmp, final_in|wc-l:42976 test-data.in txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_src_0010 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0010 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_ctrl_0010 msg|byteEnq/Deq:235/0|1876/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender sshd,30261 ├─sshd,28220 │ └─bash,28222 │ ├─bash,3763 test.sh │ │ └─python,4148 /usr/bin/qpid-config del queue q_ctrl_0000 │ │ ├─{python},4151 │ │ ├─{python},4152 │ │ └─{python},4153 │ └─bash,7949 test.sh │ └─sender,12214 --username guest --password guest -p 5672 --log-enable info+ --routing-key q_src_0010 ... │ ├─{sender},12215 │ └─{sender},12216 └─sshd,30995 └─bash,30997 └─pstree,12217 -ap 30261 root@mrg-qe-11:~/bz486314# strace -p 12214 Process 12214 attached - interrupt to quit futex(0x87e2d0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> Process 12214 detached root@mrg-qe-11:~/bz486314# pstack 12214 Thread 3 (Thread 0xb7f10b90 (LWP 12215)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x0031a376 in epoll_wait () from /lib/libc.so.6 #2 0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0 #3 0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0 #4 0x005318d4 in qpid::sys::Dispatcher::run () #5 0x0016e6b5 in qpid::client::TCPConnector::run () #6 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #7 0x0023373b in start_thread () from /lib/libpthread.so.0 #8 0x00319cfe in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb750fb90 (LWP 12216)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0 #4 0x001825d7 in qpid::client::Dispatcher::run () #5 0x001b74f8 in qpid::client::SubscriptionManager::run () #6 0x00190c9b in qpid::client::FailoverListener::run () #7 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #8 0x0023373b in start_thread () from /lib/libpthread.so.0 #9 0x00319cfe in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7f11720 (LWP 12214)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x0015067b in qpid::client::Bounds::expand () #4 0x001a5678 in qpid::client::SessionImpl::sendFrame () #5 0x001a57ec in qpid::client::SessionImpl::handleOut () #6 0x001a4c07 in qpid::client::SessionImpl::sendContent () #7 0x001ad2d6 in qpid::client::SessionImpl::sendCommand () #8 0x001ad637 in qpid::client::SessionImpl::send () #9 0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer #10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send () #11 0x0019bdfb in qpid::client::MessageReplayTracker::send () #12 0x08051014 in Sender::execute () #13 0x0018ad5e in qpid::client::FailoverManager::execute () #14 0x08051371 in main () root@mrg-qe-11:~/bz486314# strace -p 12214 Process 12214 attached - interrupt to quit futex(0x87e2d0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> Process 12214 detached root@mrg-qe-11:~/bz486314# gdb GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". (gdb) attach 12214 Attaching to process 12214 Reading symbols from /root/bz486314/sender...done. warning: .dynamic section for "/usr/lib/libsasl2.so.2" is not at the expected address warning: difference appears to be caused by prelink, adjusting expectations Reading symbols from /usr/lib/libqpidclient.so.0...Reading symbols from /usr/lib/debug/usr/lib/libqpidclient.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib/libqpidclient.so.0 Reading symbols from /usr/lib/libstdc++.so.6...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/lib/libqpidcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib/libqpidcommon.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib/libqpidcommon.so.0 Reading symbols from /usr/lib/libboost_program_options.so.2...done. Loaded symbols for /usr/lib/libboost_program_options.so.2 Reading symbols from /lib/libuuid.so.1...done. Loaded symbols for /lib/libuuid.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libboost_filesystem.so.2...done. Loaded symbols for /usr/lib/libboost_filesystem.so.2 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /usr/lib/libsasl2.so.2...done. Loaded symbols for /usr/lib/libsasl2.so.2 Reading symbols from /lib/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0xb7f11720 (LWP 12214)] [New Thread 0xb750fb90 (LWP 12216)] [New Thread 0xb7f10b90 (LWP 12215)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /usr/lib/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/sslconnector.so.debug...done. done. Loaded symbols for /usr/lib/qpid/client/sslconnector.so Reading symbols from /usr/lib/libsslcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib/libsslcommon.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib/libsslcommon.so.0 Reading symbols from /usr/lib/libnss3.so...done. Loaded symbols for /usr/lib/libnss3.so Reading symbols from /usr/lib/libssl3.so...done. Loaded symbols for /usr/lib/libssl3.so Reading symbols from /usr/lib/libnspr4.so...done. Loaded symbols for /usr/lib/libnspr4.so Reading symbols from /usr/lib/libnssutil3.so...done. Loaded symbols for /usr/lib/libnssutil3.so Reading symbols from /usr/lib/libplc4.so...done. Loaded symbols for /usr/lib/libplc4.so Reading symbols from /usr/lib/libplds4.so...done. Loaded symbols for /usr/lib/libplds4.so Reading symbols from /usr/lib/qpid/client/rdmaconnector.so...Reading symbols from /usr/lib/debug/usr/lib/qpid/client/rdmaconnector.so.debug...done. done. Loaded symbols for /usr/lib/qpid/client/rdmaconnector.so Reading symbols from /usr/lib/librdmawrap.so.0...Reading symbols from /usr/lib/debug/usr/lib/librdmawrap.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib/librdmawrap.so.0 Reading symbols from /usr/lib/librdmacm.so.1...done. Loaded symbols for /usr/lib/librdmacm.so.1 Reading symbols from /usr/lib/libibverbs.so.1...done. Loaded symbols for /usr/lib/libibverbs.so.1 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/lib/sasl2/liblogin.so.2...done. Loaded symbols for /usr/lib/sasl2/liblogin.so.2 Reading symbols from /usr/lib/sasl2/libanonymous.so.2...done. Loaded symbols for /usr/lib/sasl2/libanonymous.so.2 Reading symbols from /usr/lib/sasl2/libsasldb.so.2...done. Loaded symbols for /usr/lib/sasl2/libsasldb.so.2 Reading symbols from /usr/lib/sasl2/libplain.so.2...done. Loaded symbols for /usr/lib/sasl2/libplain.so.2 0x00e0a410 in __kernel_vsyscall () (gdb) info threads 3 Thread 0xb7f10b90 (LWP 12215) 0x00e0a410 in __kernel_vsyscall () 2 Thread 0xb750fb90 (LWP 12216) 0x00e0a410 in __kernel_vsyscall () * 1 Thread 0xb7f11720 (LWP 12214) 0x00e0a410 in __kernel_vsyscall () (gdb) thread apply all bt Thread 3 (Thread 0xb7f10b90 (LWP 12215)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x0031a376 in epoll_wait () from /lib/libc.so.6 #2 0x004e08e9 in qpid::sys::Poller::wait (this=0x87e2a78, timeout={nanosecs = 9223372036854775807}) at qpid/sys/epoll/EpollPoller.cpp:439 #3 0x004e1916 in qpid::sys::Poller::run (this=0x87e2a78) at qpid/sys/epoll/EpollPoller.cpp:405 #4 0x005318d4 in qpid::sys::Dispatcher::run (this=0xb7f10318) at qpid/sys/Dispatcher.cpp:37 #5 0x0016e6b5 in qpid::client::TCPConnector::run (this=0x87e29b0) at qpid/client/Connector.cpp:408 #6 0x004d6871 in runRunnable (p=0x87e29bc) at qpid/sys/posix/Thread.cpp:35 #7 0x0023373b in start_thread () from /lib/libpthread.so.0 #8 0x00319cfe in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb750fb90 (LWP 12216)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop (this=0x87e7c38, timeout= {nanosecs = 9223372036854775807}) at qpid/sys/posix/Condition.h:63 #4 0x001825d7 in qpid::client::Dispatcher::run (this=0x87e4b8c) at qpid/client/Dispatcher.cpp:72 #5 0x001b74f8 in qpid::client::SubscriptionManager::run (this=0x87e4b70) at qpid/client/SubscriptionManager.cpp:95 #6 0x00190c9b in qpid::client::FailoverListener::run (this=0x87e3978) at qpid/client/FailoverListener.cpp:70 #7 0x004d6871 in runRunnable (p=0x87e397c) at qpid/sys/posix/Thread.cpp:35 #8 0x0023373b in start_thread () from /lib/libpthread.so.0 #9 0x00319cfe in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7f11720 (LWP 12214)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x0015067b in qpid::client::Bounds::expand (this=0x87e2cf0, sizeRequired=63, block=true) at qpid/sys/posix/Condition.h:63 #4 0x001a5678 in qpid::client::SessionImpl::sendFrame (this=0x89fcd48, frame=@0xbfa1c3d4, canBlock=true) at qpid/client/SessionImpl.cpp:498 #5 0x001a57ec in qpid::client::SessionImpl::handleOut (this=0x89fcd48, frame=@0xbfa1c3d4) ---Type <return> to continue, or q <return> to quit--- at qpid/client/SessionImpl.cpp:482 #6 0x001a4c07 in qpid::client::SessionImpl::sendContent (this=0x89fcd48, content=@0xb6aabae8) at qpid/client/SessionImpl.cpp:403 #7 0x001ad2d6 in qpid::client::SessionImpl::sendCommand (this=0x89fcd48, command=@0xbfa1c628, content=0xb6aabae8) at qpid/client/SessionImpl.cpp:385 #8 0x001ad637 in qpid::client::SessionImpl::send (this=0x89fcd48, command=@0xbfa1c628, content=@0xb6aabae8) at qpid/client/SessionImpl.cpp:293 #9 0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer (this=0xbfa1cbf8, destination=@0xb6aabba0, acceptMode=1 '\001', acquireMode=0 '\0', content=@0xb6aabae8, sync=false) at gen/qpid/client/no_keyword/AsyncSession_0_10.cpp:58 #10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send (this=0xb6aabad0, tracker=@0xbfa1cbf8) at gen/qpid/client/AsyncSession_0_10.h:305 #11 0x0019bdfb in qpid::client::MessageReplayTracker::send (this=0xbfa1cbf8, message=@0xbfa1cc10, destination=@0xbfa1cbf4) at qpid/client/MessageReplayTracker.cpp:32 #12 0x08051014 in Sender::execute (this=0xbfa1cbf0, session=@0xbfa1ca54, isRetry=false) at sender.cpp:93 #13 0x0018ad5e in qpid::client::FailoverManager::execute (this=0xbfa1ccd0, c=@0xbfa1cbf0) at qpid/client/FailoverManager.cpp:49 #14 0x08051371 in main (argc=13, argv=0xbfa1ce24) at sender.cpp:108 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /root/bz486314/sender, process 12214 root@mrg-qe-11:~/bz486314# !netst netstat -tlnp | grep qpidd tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 28480/qpidd # <> CANNOT pstack/gdb qpidd process (Operation not permitted) # from another session root@mrg-qe-11:~# qpid-cluster Traceback (most recent call last): File "/usr/bin/qpid-cluster", line 322, in ? if e.__repr__().find("connection aborted") > 0: AttributeError: Timeout instance has no attribute '__repr__' root@mrg-qe-11:~# pstack 12214 Thread 3 (Thread 0xb7f10b90 (LWP 12215)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x0031a376 in epoll_wait () from /lib/libc.so.6 #2 0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0 #3 0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0 #4 0x005318d4 in qpid::sys::Dispatcher::run () #5 0x0016e6b5 in qpid::client::TCPConnector::run () #6 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #7 0x0023373b in start_thread () from /lib/libpthread.so.0 #8 0x00319cfe in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb750fb90 (LWP 12216)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0 #4 0x001825d7 in qpid::client::Dispatcher::run () #5 0x001b74f8 in qpid::client::SubscriptionManager::run () #6 0x00190c9b in qpid::client::FailoverListener::run () #7 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #8 0x0023373b in start_thread () from /lib/libpthread.so.0 #9 0x00319cfe in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7f11720 (LWP 12214)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x0015067b in qpid::client::Bounds::expand () #4 0x001a5678 in qpid::client::SessionImpl::sendFrame () #5 0x001a57ec in qpid::client::SessionImpl::handleOut () #6 0x001a4c07 in qpid::client::SessionImpl::sendContent () #7 0x001ad2d6 in qpid::client::SessionImpl::sendCommand () #8 0x001ad637 in qpid::client::SessionImpl::send () #9 0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer #10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send () #11 0x0019bdfb in qpid::client::MessageReplayTracker::send () #12 0x08051014 in Sender::execute () #13 0x0018ad5e in qpid::client::FailoverManager::execute () #14 0x08051371 in main () root@mrg-qe-11:~# pstack 12214 Thread 3 (Thread 0xb7f10b90 (LWP 12215)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x0031a376 in epoll_wait () from /lib/libc.so.6 #2 0x004e08e9 in qpid::sys::Poller::wait () from /usr/lib/libqpidcommon.so.0 #3 0x004e1916 in qpid::sys::Poller::run () from /usr/lib/libqpidcommon.so.0 #4 0x005318d4 in qpid::sys::Dispatcher::run () #5 0x0016e6b5 in qpid::client::TCPConnector::run () #6 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #7 0x0023373b in start_thread () from /lib/libpthread.so.0 #8 0x00319cfe in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb750fb90 (LWP 12216)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00187228 in qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop () from /usr/lib/libqpidclient.so.0 #4 0x001825d7 in qpid::client::Dispatcher::run () #5 0x001b74f8 in qpid::client::SubscriptionManager::run () #6 0x00190c9b in qpid::client::FailoverListener::run () #7 0x004d6871 in ?? () from /usr/lib/libqpidcommon.so.0 #8 0x0023373b in start_thread () from /lib/libpthread.so.0 #9 0x00319cfe in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7f11720 (LWP 12214)): #0 0x00e0a410 in __kernel_vsyscall () #1 0x002379e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x003263cd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x0015067b in qpid::client::Bounds::expand () #4 0x001a5678 in qpid::client::SessionImpl::sendFrame () #5 0x001a57ec in qpid::client::SessionImpl::handleOut () #6 0x001a4c07 in qpid::client::SessionImpl::sendContent () #7 0x001ad2d6 in qpid::client::SessionImpl::sendCommand () #8 0x001ad637 in qpid::client::SessionImpl::send () #9 0x00144051 in qpid::client::no_keyword::AsyncSession_0_10::messageTransfer #10 0x0019ad2a in qpid::client::MessageReplayTracker::ReplayRecord::send () #11 0x0019bdfb in qpid::client::MessageReplayTracker::send () #12 0x08051014 in Sender::execute () #13 0x0018ad5e in qpid::client::FailoverManager::execute () #14 0x08051371 in main () So broker is not responding and finally also other nodes become not responding: root@mrg-qe-12:~/fcluster# qpid-stat -b Failed: Timeout - Waiting for connection to be established with broker root@mrg-qe-12:~/fcluster# cat /etc/qpidd.conf cluster-mechanism=ANONYMOUS #max-session-rate=10000 cluster-name=mrg_pepe_test_cluster_ cluster-cman=no #cluster-url=amqp:tcp:192.168.157.9:5672,tcp:192.168.157.10:5672,tcp:192.168.157.11:5672,tcp:192.168.157.12:5672,tcp:192.168.157.12:5675 auth=yes log-enable=info+ log-enable=debug+:cluster log-to-file=/tmp/qpidd.log testing packages: root@mrg-qe-11:~# rpm -qa | grep -E '(qpid|ais)' | sort openais-0.80.6-8.el5_4.1 openais-debuginfo-0.80.6-8.el5_4.1 openais-devel-0.80.6-8.el5_4.1 python-qpid-0.5.752581-4.el5 qpidc-0.5.752581-34.el5 qpidc-debuginfo-0.5.752581-34.el5 qpidc-devel-0.5.752581-34.el5 qpidc-rdma-0.5.752581-34.el5 qpidc-ssl-0.5.752581-34.el5 qpidd-0.5.752581-34.el5 qpidd-acl-0.5.752581-34.el5 qpidd-cluster-0.5.752581-34.el5 qpidd-devel-0.5.752581-34.el5 qpid-dotnet-0.4.738274-2.el5 qpidd-rdma-0.5.752581-34.el5 qpidd-ssl-0.5.752581-34.el5 qpidd-xml-0.5.752581-34.el5 qpid-java-client-0.5.751061-9.el5 qpid-java-common-0.5.751061-9.el5 Created attachment 381773 [details]
the reproducer
Attached shell script should be able to reproduce the problem.
Steps:
0] stop all daemons
1] configure openais and qpidd
2] start openais on node A and B
3] start qpidd on node A and then B
4] download txshift.cpp, txjob.cpp, receiver.cpp, sender.cp and *Options.h
and compile txshift, txjob, receiver, sender
5] run attachment w/o params on one node - wait for hang
Created attachment 381777 [details]
qpidd.log for both nodes
The package contains qpidd.log for both nodes (both RHEL 5.4 i386)
These type of errors are expected, because the repro is not perfect and tries to delete non existing queue as clean-up:
channel error 13291670 on 192.168.157.11:28480-656(local): not
-found: Queue not found: q_dest_0009
Hi Frantisek -- I finally started trying to reproduce this, but your script requires a file called qc_list_objects.py that isn't in the svn tree. Where can I get a copy of that? Hi Mick, please find the clients stored here: http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_common/clients/python/ I'm sorry for late reply, I did not get notified. It looks like I let this one age long enough that it got fixed by other changes. In recent code -- svn rev 980427 -- I used a 2-broker cluster ( 1 on each box ) and a script very closely modelled on Frantisek's original description ( I will include it below ). I have now run the script 150 times with no hang. If the original bug was reproducible about 10% of the time, then the odds of this many successful consecutive runs happening by accident are about 1 in 10 million. The only way that my script is significantly different from the steps-to-repro in the original report is that I tell the receiver how many messages to expect. If I leave that argument out -- it hangs. It may be that the original script left out that arg -- but the receiver nevertheless terminated *most* of the time. In any case, it looks like this bug has died of old age. Here is the script I used ( not including the starting of the 2-box cluster ) #! /bin/bash ############################################### # 486314.sh ############################################### export LD_LIBRARY_PATH=$TRUNK/qpid/cpp/src/.libs QPIDD_SRC=$TRUNK/qpid/cpp/src QPIDD=$QPIDD_SRC/.libs/qpidd count=0 while [ $count -lt 500 ] do echo "===========================================" echo "TEST $count" date echo "===========================================" #################################################### ## define queues #################################################### echo "define queues" qpid-config -a 127.0.0.1:5801 add queue mick_0 qpid-config -a 127.0.0.1:5801 add queue mick_1 qpid-config -a 127.0.0.1:5801 add queue mick_2 sleep 5 #################################################### ## Make transaction job #################################################### echo "Make transaction job" $QPIDD_SRC/tests/txjob -p 5801 --log-enable info+ --messages 100 --jobs 10 --source mick_1 --dest mick_2 --work-queue mick_0 --add-quit sleep 5 #################################################### ## generate 1000 messages. #################################################### echo "generate 1000 messages." $QPIDD_SRC/tests/datagen --count 1000 | $QPIDD_SRC/tests/sender -p 5801 --routing-key mick_1 sleep 5 #################################################### ## Shift from queue 1 to 2 #################################################### echo "Shift from queue 1 to 2" $QPIDD_SRC/tests/txshift --work-queue mick_0 -p 5801 --log-enable info+ #################################################### ## Receive the messages. #################################################### echo "Receive the messages." $QPIDD_SRC/tests/receiver --messages 1000 -p 5801 --log-enable info+ --queue mick_2 > test-data.out #################################################### ## Count the messages. #################################################### echo "Count the messages." wc -l test-data.out #################################################### ## undefine queues #################################################### echo "undefine queues" qpid-config -a 127.0.0.1:5801 del queue mick_0 qpid-config -a 127.0.0.1:5801 del queue mick_1 qpid-config -a 127.0.0.1:5801 del queue mick_2 sleep 5 count=$(( $count + 1 )) done The issue is currently under test for few hours w/o any issue. I tend to CLOSED/NOTABUG. Final feedback soon (test need to run for about a day to be on the safe side of probability street). Ok, I got the point here. I retested my reproducer with authentication off and everything was perfect. Once I switched on the authentication I triggered the issue again, currently during txshift run: [root@mrg-qe-01 bz486314]# ./run.sh 1 2 10 aisexec dead but subsys locked Starting OpenAIS daemon (aisexec): [ OK ] Stopping OpenAIS daemon (aisexec): [ OK ] Starting OpenAIS daemon (aisexec): [ OK ] 10-08-03_11:08:53 slave():compiling clients from sources... 10-08-03_11:08:56 .done (err_cnt:0) 10-08-03_11:08:56 .broker[s] start 10-08-03_11:09:06 ..err_cnt:0, pids:12429,12443 10-08-03_11:09:06 .broker[s] check tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 12429/qpidd tcp 0 0 0.0.0.0:5673 0.0.0.0:* LISTEN 12443/qpidd 10-08-03_11:09:06 ..err_cnt:0, ecode:00 cnt:0/10|data_n:148, data_m:99, ps|wc-l:148 test-data.in.tmp, final_in|wc-l:14652 test-data.in.0000 txjob1 ecode:0, dur:0 txjob2 ecode:0, dur:0 q_ctrl_0000 msg|byteEnq/Deq:101/0|804/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 sender ecode:0, dur:0 q_ctrl_0000 msg|byteEnq/Deq:101/0|804/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_dest_0000 msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 q_src_0000 msg|byteEnq/Deq:14652/0|1160280/0 unacked:0 consumer_cnt:0 binding_cnt:1 txshift <hanged, for minutes, when expected time to complete is <20 seconds> snapping process state shows (the same as above): (gdb) 3 Thread 0xb7ef9b90 (LWP 12695) 0x00d7e410 in __kernel_vsyscall () 2 Thread 0xb72ffb90 (LWP 12696) 0x00d7e410 in __kernel_vsyscall () * 1 Thread 0xb7efa940 (LWP 12694) 0x00d7e410 in __kernel_vsyscall () (gdb) Thread 3 (Thread 0xb7ef9b90 (LWP 12695)): #0 0x00d7e410 in __kernel_vsyscall () #1 0x001e2486 in epoll_wait () from /lib/libc.so.6 #2 0x00562c2a in qpid::sys::Poller::wait (this=0x90bab18, timeout=...) at qpid/sys/epoll/EpollPoller.cpp:563 #3 0x00563853 in qpid::sys::Poller::run (this=0x90bab18) at qpid/sys/epoll/EpollPoller.cpp:515 #4 0x00558d01 in qpid::sys::(anonymous namespace)::runRunnable (p=0x90bab18) at qpid/sys/posix/Thread.cpp:35 #5 0x002c1832 in start_thread () from /lib/libpthread.so.0 #6 0x001e1e0e in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb72ffb90 (LWP 12696)): #0 0x00d7e410 in __kernel_vsyscall () #1 0x002c5bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x001ee4dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00cea88c in pop (this=0xb73008d0, timeout=...) at ../include/qpid/sys/posix/Condition.h:63 #4 qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop (this=0xb73008d0, timeout=...) at qpid/sys/BlockingQueue.h:71 #5 0x00ce6e87 in qpid::client::Dispatcher::run (this=0xb7300a44) at qpid/client/Dispatcher.cpp:80 #6 0x00d1e008 in qpid::client::SubscriptionManagerImpl::run (this=0xb7300a20) at qpid/client/SubscriptionManagerImpl.cpp:98 #7 0x00d1c034 in qpid::client::SubscriptionManager::run (this=0xb7300548) at qpid/client/SubscriptionManager.cpp:60 #8 0x00ceb5d0 in qpid::client::FailoverListener::run (this=0xb7300508) at qpid/client/FailoverListener.cpp:52 #9 0x00558d01 in qpid::sys::(anonymous namespace)::runRunnable (p=0xb730050c) at qpid/sys/posix/Thread.cpp:35 #10 0x002c1832 in start_thread () from /lib/libpthread.so.0 #11 0x001e1e0e in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7efa940 (LWP 12694)): #0 0x00d7e410 in __kernel_vsyscall () #1 0x002c5bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x001ee4dd in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00cea88c in pop (this=0xb7301310, timeout=...) at ../include/qpid/sys/posix/Condition.h:63 #4 qpid::sys::BlockingQueue<boost::shared_ptr<qpid::framing::FrameSet> >::pop (this=0xb7301310, timeout=...) at qpid/sys/BlockingQueue.h:71 #5 0x00ce6e87 in qpid::client::Dispatcher::run (this=0xb73013bc) at qpid/client/Dispatcher.cpp:80 #6 0x00d1e008 in qpid::client::SubscriptionManagerImpl::run (this=0xb7301398) at qpid/client/SubscriptionManagerImpl.cpp:98 #7 0x00d1c034 in qpid::client::SubscriptionManager::run (this=0xbfc1b274) at qpid/client/SubscriptionManager.cpp:60 #8 0x0805a730 in qpid::tests::Worker::execute(qpid::client::AsyncSession_0_10&, bool) () #9 0x00cf2268 in qpid::client::FailoverManager::execute (this=0xbfc1b664, c=...) at qpid/client/FailoverManager.cpp:49 #10 0x08059d71 in qpid::tests::Worker::run() () #11 0x08050ede in main () (gdb) Detaching from program: /root/bz/bz486314/txshift, process 12694 (gdb) quit I may miss something fundamental here, but at the moment I need to return it to ASSIGNED. Created attachment 436297 [details]
Reproducer of the last hang (more complete)
On RHEL5 run in clustered env. as
./run.sh 1 2 <nr-of-loops> # authentication done
./run.sh 0 2 <nr-of-loops> # no authentication done
on RHEL4 use:
./run.sh 1 1 <nr-of-loops> # authentication done
./run.sh 0 1 <nr-of-loops> # no authentication done
The issue reproduced on mrg35 where -11 source packages compiled and modified test repro ran. Data in /usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/tests/. Tu run use: LD_LIBRARY_PATH=/usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/.libs ./run.sh 1 1 Reproduced in non-clustering environment with txshift hanging here: (*): Shared library is missing debugging information. (gdb) 2 Thread 0x42405940 (LWP 29689) 0x00000032078d4108 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x2b4283852f70 (LWP 29668) 0x000000320840aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) Thread 2 (Thread 0x42405940 (LWP 29689)): #0 0x00000032078d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b42832c4121 in qpid::sys::Poller::wait (this=0x1439c340, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002b42832c4bb7 in qpid::sys::Poller::run (this=0x1439c340) at qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002b42832bae3a in qpid::sys::(anonymous namespace)::runRunnable (p=0x6) at qpid/sys/posix/Thread.cpp:35 #4 0x000000320840673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000032078d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b4283852f70 (LWP 29668)): #0 0x000000320840aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b4282f2e54b in pop (this=0x1439ea30) at ../include/qpid/sys/posix/Condition.h:63 #2 qpid::client::Dispatcher::run (this=0x1439ea30) at qpid/client/Dispatcher.cpp:80 #3 0x000000000040fca8 in qpid::tests::Worker::execute (this=0x7fffbf1261d0, session=..., isRetry=<value optimized out>) at txshift.cpp:160 #4 0x00002b4282f38648 in qpid::client::FailoverManager::execute ( this=0x7fffbf1260c0, c=...) at qpid/client/FailoverManager.cpp:49 #5 0x000000000040b059 in main (argc=13, argv=<value optimized out>) at txshift.cpp:141 (gdb) Detaching from program: /usr/src/redhat/BUILD/qpid-cpp-mrg-0.7.946106/cpp/src/tests/.libs/lt-txshift, process 29668 (gdb) quit Neither qpidd log nor hanging txshift doesn't show anything abnormal. I revisited the hang backtraces and noticed that I overlooked the fact that recently reproduced hang in txshift (comment 13) is different from the original one (in sender). The effort is now in making sure we understand what is blocking the txshift, once clear then I'll rerun and update BZ. ok, my bad. The comment 13 hang is dummy caused by changing the reproducer. Currently reproducing with authentication. Feedback soon. My testing over night proved that comments 13 - 17 was blind branch caused by issue in the reproducer. I'm not able to trigger the issue anymore after (4 x 1100 runs on different machines). I tend to close the bug as CLOSED/NOTABUG. Mick, feel free to kick it there if you agree. Frantisek and I together have run 1700 iterations of his script with no repro. Deadbug. |