Description of problem: While bug 456272 regression testing, there was detected new problem related to sudden and unexpected SessionDetached()exception propagation from broker to client. This exception on client side is most probably linked to broker's error: 2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) The python client is creating simultaneously multiple queues in multiple threads in the broker. The testing environment also consist of loading 'find' processes which take huge amount of disk I/O and CPU utilization. The environment under which the problem occurrend can be clearly titled as stressy. Version-Release number of selected component (if applicable): python-qmf-0.7.946106-8.el5 python-qpid-0.7.946106-12.el5 qpid-cpp-*-0.7.946106-12.el5 How reproducible: very rarely (<1%) Steps to Reproduce: Run the MRG/Messaging/qpid_broker_jfail_bz456272 test in the loop or alternatively: 1. start broker with message store 2. launch few machine loading find processes (find /usr -type f -exec cat {} \; | \ split -b 10m - ${FIND_RESULT_LOG}.$(date +%Y%m%d_%H%M%S) ) & # split allows to splitting for search result deletion 3. launch bz456272 reproducer in the loop, can be found here (*.py) http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_broker_jfail_bz456272/ 'python ./bz456272.py' or better 'python ./bz456272.pyc' Actual results: Broker rarely exits the python client with SessionDetached() and prints framing error: 2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) Expected results: No framing error, no exception on client's side: Additional info (the log from the failing case): [16:19:12] run 10/12--------------------------[err_cnt:0]- qpidd-transcript------------- 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/watchdog.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/xml.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so 2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/acl.so 2010-08-13 16:19:12 info No message store configured, persistence is disabled. 2010-08-13 16:19:12 info Management enabled 2010-08-13 16:19:12 info ManagementAgent generated broker ID: 37763838-616d-4418-8e8d-9b3805e3b105 2010-08-13 16:19:12 notice Journal "TplStore": Created 2010-08-13 16:19:12 notice Store module initialized; store-dir=/mnt/tests/distribution/MRG/Messaging/qpid_broker_jfail_bz456272/rhts_qpidd/20100813_154811/broker.9670 2010-08-13 16:19:12 info > Default files per journal: 16 2010-08-13 16:19:12 info > Default journal file size: 32 (wpgs) 2010-08-13 16:19:12 info > Default write cache page size: 32 (Kib) 2010-08-13 16:19:12 info > Default number of write cache pages: 32 2010-08-13 16:19:12 info > TPL files per journal: 8 2010-08-13 16:19:12 info > TPL journal file size: 32 (wpgs) 2010-08-13 16:19:12 info > TPL write cache page size: 128 (Kib) 2010-08-13 16:19:12 info > TPL number of write cache pages: 8 2010-08-13 16:19:12 info Registered xml exchange 2010-08-13 16:19:12 info Registered replication exchange 2010-08-13 16:19:12 info Most recent persistence id found: 0x0 2010-08-13 16:19:12 notice SASL disabled: No Authentication Performed 2010-08-13 16:19:12 notice Listening on TCP port 5672 2010-08-13 16:19:12 info SSL plugin not enabled, you must set --ssl-cert-db to enable it. 2010-08-13 16:19:12 info Rdma: Disabled: no rdma devices found 2010-08-13 16:19:12 info Policy file not specified. ACL Disabled, no ACL checking being done! 2010-08-13 16:19:12 notice Broker running 2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_0_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_0_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_1_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_1_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_2_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_2_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_3_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_3_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:17 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_5_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:17 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_5_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) 2010-08-13 16:19:41 notice Journal "q_queue_000100": Created 2010-08-13 16:19:41 notice Journal "q_queue_000001": Created 2010-08-13 16:19:42 notice Journal "q_queue_000010": Created 2010-08-13 16:19:42 notice Journal "q_queue_100000": Created 2010-08-13 16:19:42 notice Journal "q_queue_001000": Created 2010-08-13 16:19:42 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 42, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:42 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 42, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:43 info Queue "q_queue_000001": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:43 info Queue "q_queue_000100": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:43 info Queue "q_queue_000010": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:43 info Queue "q_queue_100000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:19:43 info Queue "q_queue_001000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:00 notice Journal "q_queue_000002": Created 2010-08-13 16:20:00 notice Journal "q_queue_200000": Created 2010-08-13 16:20:00 notice Journal "q_queue_020000": Created 2010-08-13 16:20:00 notice Journal "q_queue_002000": Created 2010-08-13 16:20:00 notice Journal "q_queue_000200": Created 2010-08-13 16:20:00 notice Journal "q_queue_000020": Created 2010-08-13 16:20:01 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:20:01 info Queue "q_queue_000002": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:01 info Queue "q_queue_000200": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:01 info Queue "q_queue_000020": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:01 info Queue "q_queue_002000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:01 info Queue "q_queue_020000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:01 info Queue "q_queue_200000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:15 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:20:17 warning Timer callback overran by 2ms [taking 22000ns] 2010-08-13 16:20:17 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:18 notice Journal "q_queue_000003": Created 2010-08-13 16:20:18 notice Journal "q_queue_030000": Created 2010-08-13 16:20:18 notice Journal "q_queue_300000": Created 2010-08-13 16:20:18 notice Journal "q_queue_000030": Created 2010-08-13 16:20:18 notice Journal "q_queue_003000": Created 2010-08-13 16:20:18 notice Journal "q_queue_000300": Created 2010-08-13 16:20:19 info Queue "q_queue_003000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:19 info Queue "q_queue_000003": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:19 info Queue "q_queue_300000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:19 info Queue "q_queue_030000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:19 info Queue "q_queue_000300": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:19 info Queue "q_queue_000030": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:27 warning Timer callback overran by 2ms [taking 15000ns] 2010-08-13 16:20:27 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 13000ns] 2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:32 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:20:32 warning Timer callback overran by 2ms [taking 16000ns] 2010-08-13 16:20:33 warning Timer callback overran by 2ms [taking 15000ns] 2010-08-13 16:20:33 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:36 notice Journal "q_queue_040000": Created 2010-08-13 16:20:36 notice Journal "q_queue_000400": Created 2010-08-13 16:20:36 notice Journal "q_queue_000004": Created 2010-08-13 16:20:36 notice Journal "q_queue_004000": Created 2010-08-13 16:20:36 notice Journal "q_queue_000040": Created 2010-08-13 16:20:36 notice Journal "q_queue_400000": Created 2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:20:38 info Queue "q_queue_000400": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:38 info Queue "q_queue_400000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:39 info Queue "q_queue_040000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:39 info Queue "q_queue_004000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:39 info Queue "q_queue_000004": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:39 info Queue "q_queue_000040": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:42 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:20:42 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:20:47 warning Timer callback overran by 2ms [taking 8000ns] 2010-08-13 16:20:53 warning Timer callback overran by 2ms [taking 18000ns] 2010-08-13 16:20:54 notice Journal "q_queue_000050": Created 2010-08-13 16:20:54 notice Journal "q_queue_050000": Created 2010-08-13 16:20:54 notice Journal "q_queue_500000": Created 2010-08-13 16:20:54 notice Journal "q_queue_000005": Created 2010-08-13 16:20:54 notice Journal "q_queue_005000": Created 2010-08-13 16:20:54 notice Journal "q_queue_000500": Created 2010-08-13 16:20:56 info Queue "q_queue_500000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:56 info Queue "q_queue_005000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:56 info Queue "q_queue_000500": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:56 info Queue "q_queue_050000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:56 info Queue "q_queue_000050": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:20:56 info Queue "q_queue_000005": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 13000ns] 2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:01 warning Timer callback overran by 2ms [taking 17000ns] 2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 9000ns] 2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 9000ns] 2010-08-13 16:21:04 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:04 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:09 warning Timer callback overran by 2ms [taking 7000ns] 2010-08-13 16:21:09 warning Timer callback overran by 2ms [taking 9000ns] 2010-08-13 16:21:12 notice Journal "q_queue_000600": Created 2010-08-13 16:21:12 notice Journal "q_queue_600000": Created 2010-08-13 16:21:12 notice Journal "q_queue_000060": Created 2010-08-13 16:21:12 notice Journal "q_queue_000006": Created 2010-08-13 16:21:12 notice Journal "q_queue_060000": Created 2010-08-13 16:21:12 notice Journal "q_queue_006000": Created 2010-08-13 16:21:14 info Queue "q_queue_060000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:14 info Queue "q_queue_000006": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:14 info Queue "q_queue_006000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:14 info Queue "q_queue_600000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:14 info Queue "q_queue_000600": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:14 info Queue "q_queue_000060": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 11000ns] 2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 8000ns] 2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 13000ns] 2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 8000ns] 2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:21:30 notice Journal "q_queue_000007": Created 2010-08-13 16:21:30 notice Journal "q_queue_070000": Created 2010-08-13 16:21:30 notice Journal "q_queue_700000": Created 2010-08-13 16:21:30 notice Journal "q_queue_000700": Created 2010-08-13 16:21:30 notice Journal "q_queue_000070": Created 2010-08-13 16:21:30 notice Journal "q_queue_007000": Created 2010-08-13 16:21:32 info Queue "q_queue_000070": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:32 info Queue "q_queue_000007": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:32 info Queue "q_queue_007000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:32 info Queue "q_queue_000700": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:32 info Queue "q_queue_070000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:32 info Queue "q_queue_700000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:43 warning Timer callback overran by 2ms [taking 15000ns] 2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 17000ns] 2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:21:48 notice Journal "q_queue_008000": Created 2010-08-13 16:21:48 notice Journal "q_queue_000008": Created 2010-08-13 16:21:48 notice Journal "q_queue_800000": Created 2010-08-13 16:21:48 notice Journal "q_queue_080000": Created 2010-08-13 16:21:48 notice Journal "q_queue_000080": Created 2010-08-13 16:21:48 notice Journal "q_queue_000800": Created 2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:54 warning Timer callback overran by 2ms [taking 16000ns] 2010-08-13 16:21:55 info Queue "q_queue_800000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:55 info Queue "q_queue_000008": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:55 info Queue "q_queue_000080": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:21:56 warning Timer callback overran by 2ms [taking 10000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 12000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 7000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 9000ns] 2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 6000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 7000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:21:59 warning Timer callback overran by 2ms [taking 23000ns] 2010-08-13 16:22:01 warning Timer callback overran by 2ms [taking 5000ns] 2010-08-13 16:22:01 warning Timer callback overran by 2ms [taking 3000ns] 2010-08-13 16:22:06 notice Journal "q_queue_900000": Created 2010-08-13 16:22:06 notice Journal "q_queue_000090": Created 2010-08-13 16:22:06 notice Journal "q_queue_000009": Created 2010-08-13 16:22:07 info Queue "q_queue_008000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:07 notice Journal "q_queue_009000": Created 2010-08-13 16:22:07 info Queue "q_queue_080000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:07 notice Journal "q_queue_090000": Created 2010-08-13 16:22:08 info Queue "q_queue_000800": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:08 notice Journal "q_queue_000900": Created 2010-08-13 16:22:09 info Queue "q_queue_900000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:09 info Queue "q_queue_000090": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:10 info Queue "q_queue_000009": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:18 info Queue "q_queue_090000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:18 info Queue "q_queue_009000": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:18 info Queue "q_queue_000900": Policy created: type=reject; maxCount=0; maxSize=104857600 2010-08-13 16:22:19 warning Timer callback overran by 2ms [taking 7000ns] 2010-08-13 16:22:19 warning Timer callback overran by 2ms [taking 4000ns] 2010-08-13 16:22:22 notice Shut down core-test-transcript--------- Exception in thread Thread-19: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.4/site-packages/qpid/queue.py", line 84, in run self.listener(o) File "/usr/lib/python2.4/site-packages/qpid/management.py", line 200, in topicCb self.ssn.channel.session_completed(self.ssn.receiver._completed) File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda> method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs) File "/usr/lib/python2.4/site-packages/qpid/connection.py", line 229, in invoke self.connection.write_op(ctl) File "/usr/lib/python2.4/site-packages/qpid/connection.py", line 200, in write_op self.flush() File "/usr/lib/python2.4/site-packages/qpid/framer.py", line 57, in flush self._write(cipher_buf) File "/usr/lib/python2.4/site-packages/qpid/framer.py", line 68, in _write n = self.sock.send(buf) error: (32, 'Broken pipe') Traceback (most recent call last): File "bz456272.py", line 72, in run self.mgr.AddQueue(qid, True) File "rhtslib_management.py", line 105, in AddQueue arguments=args) File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda> method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs) File "/usr/lib/python2.4/site-packages/qpid/session.py", line 138, in invoke return self.do_invoke(op, args, kwargs) File "/usr/lib/python2.4/site-packages/qpid/session.py", line 150, in do_invoke raise SessionDetached() SessionDetached File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "bz456272.py", line 85, in run traceback.print_stack() ===> Preparing ... AddQueue(0) - connecting to broker AddQueue(0) - connected AddQueue(0) ... Waiting for phase 1 ... AddQueue(1) - connecting to broker AddQueue(1) - connected AddQueue(1) ... Waiting for phase 1 ... AddQueue(2) - connecting to broker AddQueue(2) - connected AddQueue(2) ... Waiting for phase 1 ... AddQueue(3) - connecting to broker AddQueue(3) - connected AddQueue(3) ... Waiting for phase 1 ... AddQueue(4) - connecting to broker AddQueue(4) - connected AddQueue(5) - connecting to broker AddQueue(4) ... Waiting for phase 1 ... AddQueue(5) - connected AddQueue(5) ... Waiting for phase 1 ... ===> All threads ready for phase 1 ... sleeping a few (10) seconds! AddQueue(0) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... ===> Doing loop 1 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... ---> Removing lock [1281730781.998739] AddQueue(2) ... Adding queue 'q_queue_000100' [1281730781.999683] AddQueue(0) ... Adding queue 'q_queue_000001' [1281730782.000617] AddQueue(1) ... Adding queue 'q_queue_000010' [1281730782.001600] AddQueue(0) done (time used: 0.001917) AddQueue(0) ... Waiting for phase 1 ... [1281730782.001675] AddQueue(5) ... Adding queue 'q_queue_100000' [1281730782.001935] AddQueue(2) done (time used: 0.003196) AddQueue(2) ... Waiting for phase 1 ... [1281730782.002523] AddQueue(1) done (time used: 0.001906) AddQueue(1) ... Waiting for phase 1 ... [1281730782.002666] AddQueue(5) done (time used: 0.000991) AddQueue(5) ... Waiting for phase 1 ... [1281730782.002771] AddQueue(3) ... Adding queue 'q_queue_001000' [1281730782.003716] AddQueue(4) ... Adding queue 'q_queue_010000' [1281730782.003806] AddQueue(4) FAILED (time used: 0.000090) : Exception Info: [1281730782.003970] AddQueue(3) done (time used: 0.001199) AddQueue(3) ... Waiting for phase 1 ... Stack Trace: AddQueue(4) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 2 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(0) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... ---> Removing lock [1281730800.124553] AddQueue(0) ... Adding queue 'q_queue_000002' [1281730800.125485] AddQueue(5) ... Adding queue 'q_queue_200000' [1281730800.126431] AddQueue(4) ... Adding queue 'q_queue_020000' [1281730800.127374] AddQueue(5) done (time used: 0.001889) AddQueue(5) ... Waiting for phase 1 ... [1281730800.127460] AddQueue(3) ... Adding queue 'q_queue_002000' [1281730800.128119] AddQueue(4) done (time used: 0.001688) AddQueue(4) ... Waiting for phase 1 ... [1281730800.128193] AddQueue(2) ... Adding queue 'q_queue_000200' [1281730800.128846] AddQueue(0) done (time used: 0.004293) AddQueue(0) ... Waiting for phase 1 ... [1281730800.129006] AddQueue(3) done (time used: 0.001546) AddQueue(3) ... Waiting for phase 1 ... [1281730800.129126] AddQueue(1) ... Adding queue 'q_queue_000020' [1281730800.129878] AddQueue(1) done (time used: 0.000752) AddQueue(1) ... Waiting for phase 1 ... [1281730800.130141] AddQueue(2) done (time used: 0.001948) AddQueue(2) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 3 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(5) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... ---> Removing lock [1281730818.157759] AddQueue(0) ... Adding queue 'q_queue_000003' [1281730818.158421] AddQueue(4) ... Adding queue 'q_queue_030000' [1281730818.159349] AddQueue(0) done (time used: 0.001590) AddQueue(0) ... Waiting for phase 1 ... [1281730818.159455] AddQueue(2) ... Adding queue 'q_queue_000300' [1281730818.159642] AddQueue(5) ... Adding queue 'q_queue_300000' [1281730818.160329] AddQueue(4) done (time used: 0.001908) AddQueue(4) ... Waiting for phase 1 ... [1281730818.160437] AddQueue(5) done (time used: 0.000795) AddQueue(5) ... Waiting for phase 1 ... [1281730818.160484] AddQueue(1) ... Adding queue 'q_queue_000030' [1281730818.160878] AddQueue(1) done (time used: 0.000394) AddQueue(1) ... Waiting for phase 1 ... [1281730818.160906] AddQueue(3) ... Adding queue 'q_queue_003000' [1281730818.161177] AddQueue(3) done (time used: 0.000271) AddQueue(3) ... Waiting for phase 1 ... [1281730818.161332] AddQueue(2) done (time used: 0.001877) AddQueue(2) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 4 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(0) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... ---> Removing lock [1281730836.155377] AddQueue(0) ... Adding queue 'q_queue_000004' [1281730836.155513] AddQueue(4) ... Adding queue 'q_queue_040000' [1281730836.156124] AddQueue(4) done (time used: 0.000611) AddQueue(4) ... Waiting for phase 1 ... [1281730836.156194] AddQueue(2) ... Adding queue 'q_queue_000400' [1281730836.157592] AddQueue(5) ... Adding queue 'q_queue_400000' [1281730836.157899] AddQueue(0) done (time used: 0.002522) AddQueue(0) ... Waiting for phase 1 ... [1281730836.158040] AddQueue(2) done (time used: 0.001846) AddQueue(2) ... Waiting for phase 1 ... [1281730836.158115] AddQueue(3) ... Adding queue 'q_queue_004000' [1281730836.158714] AddQueue(1) ... Adding queue 'q_queue_000040' [1281730836.159196] AddQueue(3) done (time used: 0.001081) AddQueue(3) ... Waiting for phase 1 ... [1281730836.159287] AddQueue(1) done (time used: 0.000573) AddQueue(1) ... Waiting for phase 1 ... [1281730836.159772] AddQueue(5) done (time used: 0.002180) AddQueue(5) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 5 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(4) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... ---> Removing lock [1281730854.161455] AddQueue(1) ... Adding queue 'q_queue_000050' [1281730854.162380] AddQueue(4) ... Adding queue 'q_queue_050000' [1281730854.163361] AddQueue(1) done (time used: 0.001906) AddQueue(1) ... Waiting for phase 1 ... [1281730854.163444] AddQueue(5) ... Adding queue 'q_queue_500000' [1281730854.164083] AddQueue(0) ... Adding queue 'q_queue_000005' [1281730854.164477] AddQueue(5) done (time used: 0.001033) AddQueue(5) ... Waiting for phase 1 ... [1281730854.164616] AddQueue(4) done (time used: 0.002236) AddQueue(4) ... Waiting for phase 1 ... [1281730854.165453] AddQueue(0) done (time used: 0.001370) AddQueue(0) ... Waiting for phase 1 ... [1281730854.165771] AddQueue(3) ... Adding queue 'q_queue_005000' [1281730854.166363] AddQueue(2) ... Adding queue 'q_queue_000500' [1281730854.167016] AddQueue(3) done (time used: 0.001245) AddQueue(3) ... Waiting for phase 1 ... [1281730854.167135] AddQueue(2) done (time used: 0.000772) AddQueue(2) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 6 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(1) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... ---> Removing lock [1281730872.159513] AddQueue(2) ... Adding queue 'q_queue_000600' [1281730872.160431] AddQueue(5) ... Adding queue 'q_queue_600000' [1281730872.161379] AddQueue(1) ... Adding queue 'q_queue_000060' [1281730872.162357] AddQueue(5) done (time used: 0.001926) AddQueue(5) ... Waiting for phase 1 ... [1281730872.162531] AddQueue(2) done (time used: 0.003018) AddQueue(2) ... Waiting for phase 1 ... [1281730872.162699] AddQueue(1) done (time used: 0.001320) AddQueue(1) ... Waiting for phase 1 ... [1281730872.162774] AddQueue(4) ... Adding queue 'q_queue_060000' [1281730872.162952] AddQueue(0) ... Adding queue 'q_queue_000006' [1281730872.163990] AddQueue(0) done (time used: 0.001038) AddQueue(0) ... Waiting for phase 1 ... [1281730872.164055] AddQueue(3) ... Adding queue 'q_queue_006000' [1281730872.164615] AddQueue(3) done (time used: 0.000560) AddQueue(3) ... Waiting for phase 1 ... [1281730872.164704] AddQueue(4) done (time used: 0.001930) AddQueue(4) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 7 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(2) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... ---> Removing lock [1281730890.162640] AddQueue(0) ... Adding queue 'q_queue_000007' [1281730890.163562] AddQueue(4) ... Adding queue 'q_queue_070000' [1281730890.164529] AddQueue(0) done (time used: 0.001889) AddQueue(0) ... Waiting for phase 1 ... [1281730890.164643] AddQueue(1) ... Adding queue 'q_queue_000070' [1281730890.164866] AddQueue(5) ... Adding queue 'q_queue_700000' [1281730890.165660] AddQueue(4) done (time used: 0.002098) AddQueue(4) ... Waiting for phase 1 ... [1281730890.165800] AddQueue(5) done (time used: 0.000934) AddQueue(5) ... Waiting for phase 1 ... [1281730890.165872] AddQueue(2) ... Adding queue 'q_queue_000700' [1281730890.168876] AddQueue(2) done (time used: 0.003004) AddQueue(2) ... Waiting for phase 1 ... [1281730890.168976] AddQueue(1) done (time used: 0.004333) AddQueue(1) ... Waiting for phase 1 ... [1281730890.169028] AddQueue(3) ... Adding queue 'q_queue_007000' [1281730890.169553] AddQueue(3) done (time used: 0.000525) AddQueue(3) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 8 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(0) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... ---> Removing lock [1281730908.164828] AddQueue(1) ... Adding queue 'q_queue_000080' [1281730908.164997] AddQueue(3) ... Adding queue 'q_queue_008000' [1281730908.165667] AddQueue(5) ... Adding queue 'q_queue_800000' [1281730908.166178] AddQueue(0) ... Adding queue 'q_queue_000008' [1281730908.167496] AddQueue(4) ... Adding queue 'q_queue_080000' [1281730908.168318] AddQueue(3) done (time used: 0.003321) AddQueue(3) ... Waiting for phase 1 ... [1281730908.168476] AddQueue(0) done (time used: 0.002298) AddQueue(0) ... Waiting for phase 1 ... [1281730908.168646] AddQueue(4) done (time used: 0.001150) AddQueue(4) ... Waiting for phase 1 ... [1281730908.168800] AddQueue(5) done (time used: 0.003133) AddQueue(5) ... Waiting for phase 1 ... [1281730908.169423] AddQueue(1) done (time used: 0.004595) AddQueue(1) ... Waiting for phase 1 ... [1281730908.169476] AddQueue(2) ... Adding queue 'q_queue_000800' [1281730908.169962] AddQueue(2) done (time used: 0.000486) AddQueue(2) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ===> Doing loop 9 of 10 ---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ... AddQueue(3) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(4) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... ---> Removing lock [1281730926.168189] AddQueue(4) ... Adding queue 'q_queue_090000' [1281730926.169123] AddQueue(5) ... Adding queue 'q_queue_900000' [1281730926.170089] AddQueue(4) done (time used: 0.001900) AddQueue(4) ... Waiting for phase 1 ... [1281730926.171932] AddQueue(3) ... Adding queue 'q_queue_009000' [1281730926.172504] AddQueue(0) ... Adding queue 'q_queue_000009' [1281730926.172886] AddQueue(5) done (time used: 0.003763) AddQueue(5) ... Waiting for phase 1 ... [1281730926.173000] AddQueue(3) done (time used: 0.001068) AddQueue(3) ... Waiting for phase 1 ... [1281730926.173099] AddQueue(1) ... Adding queue 'q_queue_000090' [1281730926.174987] AddQueue(1) done (time used: 0.001888) AddQueue(1) ... Waiting for phase 1 ... [1281730926.175138] AddQueue(0) done (time used: 0.002634) AddQueue(0) ... Waiting for phase 1 ... [1281730926.175237] AddQueue(2) ... Adding queue 'q_queue_000900' [1281730926.175968] AddQueue(2) done (time used: 0.000731) AddQueue(2) ... Waiting for phase 1 ... ---> Waiting for all threads to get ready again ---> ready: 0 ---> Exiting... AddQueue(4) ... waiting for phase 2... AddQueue(5) ... waiting for phase 2... AddQueue(3) ... waiting for phase 2... AddQueue(1) ... waiting for phase 2... AddQueue(0) ... waiting for phase 2... AddQueue(2) ... waiting for phase 2... AddQueue(0) ... Disconnecting AddQueue(1) ... Disconnecting AddQueue(2) ... Disconnecting AddQueue(3) ... Disconnecting AddQueue(4) ... Disconnecting AddQueue(5) ... Disconnecting
Reproducibility is higher now as I succeeded to trigger the issue again on RHEL 5.5 x86_64 on different machine (with same package set). How reproducible: 40% on RHEL 5.5 x86_64
After examining the logs with trace-level logging enabled, it seems these failures are connected with qmf traffic (if I understand Gordon's analysis correctly), specifically when a qmf session is closed. In all cases I reproduced, the frame size 0 error was immediately preceded by a SessionCompletedBody with very large (corrupted) command completion values. In once case, this corrupted message also sparked an Execution exception on account of bad session completion information. 2010-08-23 19:14:01 trace RECV [127.0.0.1:57357]: Frame[BEbe; channel=1; {SessionCompletedBody: commands={ [0,10] [65537,50462837] }; }] 2010-08-23 19:14:01 debug anonymous.rhts_mrg10.lab.bos.redhat.com.11352_3_{(2010, 8, 23, 19, 14, 1, 0, 235, 1)}: sender marked completed: { [0,10] [65537,50462837] } 2010-08-23 19:14:01 debug anonymous.rhts_mrg10.lab.bos.redhat.com.11352_3_{(2010, 8, 23, 19, 14, 1, 0, 235, 1)}: sender confirmed point moved to (11+0) 2010-08-23 19:14:01 debug Exception constructed: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) 2010-08-23 19:14:01 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) I commented out the management client from the test code (rhtslib_management.py:58, 59, 78) and reran the test. Although client connection timeout errors were observed in some runs, no further occurrence of the "Frame size 0" error was seen. Since the client is using the old API which has known issues with thread safety, and since this is a multi-threaded client, I would suggest that in this case, the broker is not at fault, and is behaving correctly given the corrupted frame is likely being sent from the Python client. I would propose changing this bug to state CLOSED/NOTABUG unless there are any objections or other considerations/tests.
Thread-safety issue in old management client is now tracked by bug 628856 (whether or not this is deemed worth fixing given it is now superceded is another question). I'm going to mark this closed as suggested by Kim.