Bug 624988

Summary: qpidd broker suffers from error 'framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)' which can result in client getting SessionDetached()
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: betaCC: esammons, gsim
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-31 08:24:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frantisek Reznicek 2010-08-18 10:16:17 UTC
Description of problem:

While bug 456272 regression testing, there was detected new problem related to sudden and unexpected SessionDetached()exception propagation from broker to client.

This exception on client side is most probably linked to broker's error:
  2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)

The python client is creating simultaneously multiple queues in multiple threads in the broker. The testing environment also consist of loading 'find' processes which take huge amount of disk I/O and CPU utilization. The environment under which the problem occurrend can be clearly titled as stressy.



Version-Release number of selected component (if applicable):
python-qmf-0.7.946106-8.el5
python-qpid-0.7.946106-12.el5
qpid-cpp-*-0.7.946106-12.el5

How reproducible:
very rarely (<1%)

Steps to Reproduce:
Run the MRG/Messaging/qpid_broker_jfail_bz456272 test in the loop or alternatively:
1. start broker with message store
2. launch few machine loading find processes
  (find /usr -type f -exec cat {} \; | \
   split -b 10m -  ${FIND_RESULT_LOG}.$(date +%Y%m%d_%H%M%S) ) &
   # split allows to splitting for search result deletion
3. launch bz456272 reproducer in the loop, can be found here (*.py)
   http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_broker_jfail_bz456272/
   'python ./bz456272.py' or better 'python ./bz456272.pyc'

Actual results:
Broker rarely exits the python client with SessionDetached() and prints framing error:
2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)

Expected results:
No framing error, no exception on client's side:

Additional info (the log from the failing case):

[16:19:12] run 10/12--------------------------[err_cnt:0]-
qpidd-transcript-------------
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/watchdog.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/xml.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so
2010-08-13 16:19:12 info Loaded Module: /usr/lib64/qpid/daemon/acl.so
2010-08-13 16:19:12 info No message store configured, persistence is disabled.
2010-08-13 16:19:12 info Management enabled
2010-08-13 16:19:12 info ManagementAgent generated broker ID: 37763838-616d-4418-8e8d-9b3805e3b105
2010-08-13 16:19:12 notice Journal "TplStore": Created
2010-08-13 16:19:12 notice Store module initialized; store-dir=/mnt/tests/distribution/MRG/Messaging/qpid_broker_jfail_bz456272/rhts_qpidd/20100813_154811/broker.9670
2010-08-13 16:19:12 info > Default files per journal: 16
2010-08-13 16:19:12 info > Default journal file size: 32 (wpgs)
2010-08-13 16:19:12 info > Default write cache page size: 32 (Kib)
2010-08-13 16:19:12 info > Default number of write cache pages: 32
2010-08-13 16:19:12 info > TPL files per journal: 8
2010-08-13 16:19:12 info > TPL journal file size: 32 (wpgs)
2010-08-13 16:19:12 info > TPL write cache page size: 128 (Kib)
2010-08-13 16:19:12 info > TPL number of write cache pages: 8
2010-08-13 16:19:12 info Registered xml exchange
2010-08-13 16:19:12 info Registered replication exchange
2010-08-13 16:19:12 info Most recent persistence id found: 0x0
2010-08-13 16:19:12 notice SASL disabled: No Authentication Performed
2010-08-13 16:19:12 notice Listening on TCP port 5672
2010-08-13 16:19:12 info SSL plugin not enabled, you must set --ssl-cert-db to enable it.
2010-08-13 16:19:12 info Rdma: Disabled: no rdma devices found
2010-08-13 16:19:12 info Policy file not specified. ACL Disabled, no ACL checking being done!
2010-08-13 16:19:12 notice Broker running
2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_0_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_0_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_1_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_1_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_2_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_2_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_3_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_3_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:16 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:17 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_5_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:17 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_5_{(2010, 8, 13, 16, 19, 16, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:17 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)
2010-08-13 16:19:41 notice Journal "q_queue_000100": Created
2010-08-13 16:19:41 notice Journal "q_queue_000001": Created
2010-08-13 16:19:42 notice Journal "q_queue_000010": Created
2010-08-13 16:19:42 notice Journal "q_queue_100000": Created
2010-08-13 16:19:42 notice Journal "q_queue_001000": Created
2010-08-13 16:19:42 info Queue "mgmt-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 42, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:42 info Queue "repl-rhts_tyan-gt24-04.rhts.eng.bos.redhat.com.29819_4_{(2010, 8, 13, 16, 19, 42, 4, 225, 1)}": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:43 info Queue "q_queue_000001": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:43 info Queue "q_queue_000100": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:43 info Queue "q_queue_000010": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:43 info Queue "q_queue_100000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:19:43 info Queue "q_queue_001000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:00 notice Journal "q_queue_000002": Created
2010-08-13 16:20:00 notice Journal "q_queue_200000": Created
2010-08-13 16:20:00 notice Journal "q_queue_020000": Created
2010-08-13 16:20:00 notice Journal "q_queue_002000": Created
2010-08-13 16:20:00 notice Journal "q_queue_000200": Created
2010-08-13 16:20:00 notice Journal "q_queue_000020": Created
2010-08-13 16:20:01 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:20:01 info Queue "q_queue_000002": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:01 info Queue "q_queue_000200": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:01 info Queue "q_queue_000020": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:01 info Queue "q_queue_002000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:01 info Queue "q_queue_020000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:01 info Queue "q_queue_200000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:15 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:20:17 warning Timer callback overran by 2ms [taking 22000ns]
2010-08-13 16:20:17 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:18 notice Journal "q_queue_000003": Created
2010-08-13 16:20:18 notice Journal "q_queue_030000": Created
2010-08-13 16:20:18 notice Journal "q_queue_300000": Created
2010-08-13 16:20:18 notice Journal "q_queue_000030": Created
2010-08-13 16:20:18 notice Journal "q_queue_003000": Created
2010-08-13 16:20:18 notice Journal "q_queue_000300": Created
2010-08-13 16:20:19 info Queue "q_queue_003000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:19 info Queue "q_queue_000003": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:19 info Queue "q_queue_300000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:19 info Queue "q_queue_030000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:19 info Queue "q_queue_000300": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:19 info Queue "q_queue_000030": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:27 warning Timer callback overran by 2ms [taking 15000ns]
2010-08-13 16:20:27 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 13000ns]
2010-08-13 16:20:28 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:32 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:20:32 warning Timer callback overran by 2ms [taking 16000ns]
2010-08-13 16:20:33 warning Timer callback overran by 2ms [taking 15000ns]
2010-08-13 16:20:33 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:36 notice Journal "q_queue_040000": Created
2010-08-13 16:20:36 notice Journal "q_queue_000400": Created
2010-08-13 16:20:36 notice Journal "q_queue_000004": Created
2010-08-13 16:20:36 notice Journal "q_queue_004000": Created
2010-08-13 16:20:36 notice Journal "q_queue_000040": Created
2010-08-13 16:20:36 notice Journal "q_queue_400000": Created
2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:37 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:20:38 info Queue "q_queue_000400": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:38 info Queue "q_queue_400000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:39 info Queue "q_queue_040000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:39 info Queue "q_queue_004000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:39 info Queue "q_queue_000004": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:39 info Queue "q_queue_000040": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:42 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:20:42 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:20:47 warning Timer callback overran by 2ms [taking 8000ns]
2010-08-13 16:20:53 warning Timer callback overran by 2ms [taking 18000ns]
2010-08-13 16:20:54 notice Journal "q_queue_000050": Created
2010-08-13 16:20:54 notice Journal "q_queue_050000": Created
2010-08-13 16:20:54 notice Journal "q_queue_500000": Created
2010-08-13 16:20:54 notice Journal "q_queue_000005": Created
2010-08-13 16:20:54 notice Journal "q_queue_005000": Created
2010-08-13 16:20:54 notice Journal "q_queue_000500": Created
2010-08-13 16:20:56 info Queue "q_queue_500000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:56 info Queue "q_queue_005000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:56 info Queue "q_queue_000500": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:56 info Queue "q_queue_050000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:56 info Queue "q_queue_000050": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:20:56 info Queue "q_queue_000005": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 13000ns]
2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:00 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:01 warning Timer callback overran by 2ms [taking 17000ns]
2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 9000ns]
2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:02 warning Timer callback overran by 2ms [taking 9000ns]
2010-08-13 16:21:04 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:04 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:09 warning Timer callback overran by 2ms [taking 7000ns]
2010-08-13 16:21:09 warning Timer callback overran by 2ms [taking 9000ns]
2010-08-13 16:21:12 notice Journal "q_queue_000600": Created
2010-08-13 16:21:12 notice Journal "q_queue_600000": Created
2010-08-13 16:21:12 notice Journal "q_queue_000060": Created
2010-08-13 16:21:12 notice Journal "q_queue_000006": Created
2010-08-13 16:21:12 notice Journal "q_queue_060000": Created
2010-08-13 16:21:12 notice Journal "q_queue_006000": Created
2010-08-13 16:21:14 info Queue "q_queue_060000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:14 info Queue "q_queue_000006": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:14 info Queue "q_queue_006000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:14 info Queue "q_queue_600000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:14 info Queue "q_queue_000600": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:14 info Queue "q_queue_000060": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:18 warning Timer callback overran by 2ms [taking 11000ns]
2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 8000ns]
2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 13000ns]
2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:22 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 8000ns]
2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:27 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:21:30 notice Journal "q_queue_000007": Created
2010-08-13 16:21:30 notice Journal "q_queue_070000": Created
2010-08-13 16:21:30 notice Journal "q_queue_700000": Created
2010-08-13 16:21:30 notice Journal "q_queue_000700": Created
2010-08-13 16:21:30 notice Journal "q_queue_000070": Created
2010-08-13 16:21:30 notice Journal "q_queue_007000": Created
2010-08-13 16:21:32 info Queue "q_queue_000070": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:32 info Queue "q_queue_000007": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:32 info Queue "q_queue_007000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:32 info Queue "q_queue_000700": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:32 info Queue "q_queue_070000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:32 info Queue "q_queue_700000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:43 warning Timer callback overran by 2ms [taking 15000ns]
2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 17000ns]
2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:44 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:45 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:46 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:21:48 notice Journal "q_queue_008000": Created
2010-08-13 16:21:48 notice Journal "q_queue_000008": Created
2010-08-13 16:21:48 notice Journal "q_queue_800000": Created
2010-08-13 16:21:48 notice Journal "q_queue_080000": Created
2010-08-13 16:21:48 notice Journal "q_queue_000080": Created
2010-08-13 16:21:48 notice Journal "q_queue_000800": Created
2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:49 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:54 warning Timer callback overran by 2ms [taking 16000ns]
2010-08-13 16:21:55 info Queue "q_queue_800000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:55 info Queue "q_queue_000008": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:55 info Queue "q_queue_000080": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:21:56 warning Timer callback overran by 2ms [taking 10000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 12000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 7000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 9000ns]
2010-08-13 16:21:57 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 6000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 7000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:21:58 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:21:59 warning Timer callback overran by 2ms [taking 23000ns]
2010-08-13 16:22:01 warning Timer callback overran by 2ms [taking 5000ns]
2010-08-13 16:22:01 warning Timer callback overran by 2ms [taking 3000ns]
2010-08-13 16:22:06 notice Journal "q_queue_900000": Created
2010-08-13 16:22:06 notice Journal "q_queue_000090": Created
2010-08-13 16:22:06 notice Journal "q_queue_000009": Created
2010-08-13 16:22:07 info Queue "q_queue_008000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:07 notice Journal "q_queue_009000": Created
2010-08-13 16:22:07 info Queue "q_queue_080000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:07 notice Journal "q_queue_090000": Created
2010-08-13 16:22:08 info Queue "q_queue_000800": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:08 notice Journal "q_queue_000900": Created
2010-08-13 16:22:09 info Queue "q_queue_900000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:09 info Queue "q_queue_000090": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:10 info Queue "q_queue_000009": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:18 info Queue "q_queue_090000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:18 info Queue "q_queue_009000": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:18 info Queue "q_queue_000900": Policy created: type=reject; maxCount=0; maxSize=104857600
2010-08-13 16:22:19 warning Timer callback overran by 2ms [taking 7000ns]
2010-08-13 16:22:19 warning Timer callback overran by 2ms [taking 4000ns]
2010-08-13 16:22:22 notice Shut down
core-test-transcript---------
Exception in thread Thread-19:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/queue.py", line 84, in run
    self.listener(o)
  File "/usr/lib/python2.4/site-packages/qpid/management.py", line 200, in topicCb
    self.ssn.channel.session_completed(self.ssn.receiver._completed)
  File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda>
    method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/connection.py", line 229, in invoke
    self.connection.write_op(ctl)
  File "/usr/lib/python2.4/site-packages/qpid/connection.py", line 200, in write_op
    self.flush()
  File "/usr/lib/python2.4/site-packages/qpid/framer.py", line 57, in flush
    self._write(cipher_buf)
  File "/usr/lib/python2.4/site-packages/qpid/framer.py", line 68, in _write
    n = self.sock.send(buf)
error: (32, 'Broken pipe')

Traceback (most recent call last):
  File "bz456272.py", line 72, in run
    self.mgr.AddQueue(qid, True)
  File "rhtslib_management.py", line 105, in AddQueue
    arguments=args)
  File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda>
    method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/session.py", line 138, in invoke
    return self.do_invoke(op, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/session.py", line 150, in do_invoke
    raise SessionDetached()
SessionDetached
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "bz456272.py", line 85, in run
    traceback.print_stack()
===> Preparing ... 
AddQueue(0) - connecting to broker
AddQueue(0) - connected
AddQueue(0) ... Waiting for phase 1 ...
AddQueue(1) - connecting to broker
AddQueue(1) - connected
AddQueue(1) ... Waiting for phase 1 ...
AddQueue(2) - connecting to broker
AddQueue(2) - connected
AddQueue(2) ... Waiting for phase 1 ...
AddQueue(3) - connecting to broker
AddQueue(3) - connected
AddQueue(3) ... Waiting for phase 1 ...
AddQueue(4) - connecting to broker
AddQueue(4) - connected
AddQueue(5) - connecting to broker
AddQueue(4) ... Waiting for phase 1 ...
AddQueue(5) - connected
AddQueue(5) ... Waiting for phase 1 ...
===> All threads ready for phase 1 ... sleeping a few (10) seconds!
AddQueue(0) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
===> Doing loop 1 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
---> Removing lock
[1281730781.998739] AddQueue(2) ... Adding queue 'q_queue_000100'
[1281730781.999683] AddQueue(0) ... Adding queue 'q_queue_000001'
[1281730782.000617] AddQueue(1) ... Adding queue 'q_queue_000010'
[1281730782.001600] AddQueue(0) done (time used: 0.001917)
AddQueue(0) ... Waiting for phase 1 ...
[1281730782.001675] AddQueue(5) ... Adding queue 'q_queue_100000'
[1281730782.001935] AddQueue(2) done (time used: 0.003196)
AddQueue(2) ... Waiting for phase 1 ...
[1281730782.002523] AddQueue(1) done (time used: 0.001906)
AddQueue(1) ... Waiting for phase 1 ...
[1281730782.002666] AddQueue(5) done (time used: 0.000991)
AddQueue(5) ... Waiting for phase 1 ...
[1281730782.002771] AddQueue(3) ... Adding queue 'q_queue_001000'
[1281730782.003716] AddQueue(4) ... Adding queue 'q_queue_010000'
[1281730782.003806] AddQueue(4) FAILED (time used: 0.000090) : 
Exception Info:
[1281730782.003970] AddQueue(3) done (time used: 0.001199)
AddQueue(3) ... Waiting for phase 1 ...
Stack Trace:
AddQueue(4) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 2 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(0) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
---> Removing lock
[1281730800.124553] AddQueue(0) ... Adding queue 'q_queue_000002'
[1281730800.125485] AddQueue(5) ... Adding queue 'q_queue_200000'
[1281730800.126431] AddQueue(4) ... Adding queue 'q_queue_020000'
[1281730800.127374] AddQueue(5) done (time used: 0.001889)
AddQueue(5) ... Waiting for phase 1 ...
[1281730800.127460] AddQueue(3) ... Adding queue 'q_queue_002000'
[1281730800.128119] AddQueue(4) done (time used: 0.001688)
AddQueue(4) ... Waiting for phase 1 ...
[1281730800.128193] AddQueue(2) ... Adding queue 'q_queue_000200'
[1281730800.128846] AddQueue(0) done (time used: 0.004293)
AddQueue(0) ... Waiting for phase 1 ...
[1281730800.129006] AddQueue(3) done (time used: 0.001546)
AddQueue(3) ... Waiting for phase 1 ...
[1281730800.129126] AddQueue(1) ... Adding queue 'q_queue_000020'
[1281730800.129878] AddQueue(1) done (time used: 0.000752)
AddQueue(1) ... Waiting for phase 1 ...
[1281730800.130141] AddQueue(2) done (time used: 0.001948)
AddQueue(2) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 3 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(5) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
---> Removing lock
[1281730818.157759] AddQueue(0) ... Adding queue 'q_queue_000003'
[1281730818.158421] AddQueue(4) ... Adding queue 'q_queue_030000'
[1281730818.159349] AddQueue(0) done (time used: 0.001590)
AddQueue(0) ... Waiting for phase 1 ...
[1281730818.159455] AddQueue(2) ... Adding queue 'q_queue_000300'
[1281730818.159642] AddQueue(5) ... Adding queue 'q_queue_300000'
[1281730818.160329] AddQueue(4) done (time used: 0.001908)
AddQueue(4) ... Waiting for phase 1 ...
[1281730818.160437] AddQueue(5) done (time used: 0.000795)
AddQueue(5) ... Waiting for phase 1 ...
[1281730818.160484] AddQueue(1) ... Adding queue 'q_queue_000030'
[1281730818.160878] AddQueue(1) done (time used: 0.000394)
AddQueue(1) ... Waiting for phase 1 ...
[1281730818.160906] AddQueue(3) ... Adding queue 'q_queue_003000'
[1281730818.161177] AddQueue(3) done (time used: 0.000271)
AddQueue(3) ... Waiting for phase 1 ...
[1281730818.161332] AddQueue(2) done (time used: 0.001877)
AddQueue(2) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 4 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(0) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
---> Removing lock
[1281730836.155377] AddQueue(0) ... Adding queue 'q_queue_000004'
[1281730836.155513] AddQueue(4) ... Adding queue 'q_queue_040000'
[1281730836.156124] AddQueue(4) done (time used: 0.000611)
AddQueue(4) ... Waiting for phase 1 ...
[1281730836.156194] AddQueue(2) ... Adding queue 'q_queue_000400'
[1281730836.157592] AddQueue(5) ... Adding queue 'q_queue_400000'
[1281730836.157899] AddQueue(0) done (time used: 0.002522)
AddQueue(0) ... Waiting for phase 1 ...
[1281730836.158040] AddQueue(2) done (time used: 0.001846)
AddQueue(2) ... Waiting for phase 1 ...
[1281730836.158115] AddQueue(3) ... Adding queue 'q_queue_004000'
[1281730836.158714] AddQueue(1) ... Adding queue 'q_queue_000040'
[1281730836.159196] AddQueue(3) done (time used: 0.001081)
AddQueue(3) ... Waiting for phase 1 ...
[1281730836.159287] AddQueue(1) done (time used: 0.000573)
AddQueue(1) ... Waiting for phase 1 ...
[1281730836.159772] AddQueue(5) done (time used: 0.002180)
AddQueue(5) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 5 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(4) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
---> Removing lock
[1281730854.161455] AddQueue(1) ... Adding queue 'q_queue_000050'
[1281730854.162380] AddQueue(4) ... Adding queue 'q_queue_050000'
[1281730854.163361] AddQueue(1) done (time used: 0.001906)
AddQueue(1) ... Waiting for phase 1 ...
[1281730854.163444] AddQueue(5) ... Adding queue 'q_queue_500000'
[1281730854.164083] AddQueue(0) ... Adding queue 'q_queue_000005'
[1281730854.164477] AddQueue(5) done (time used: 0.001033)
AddQueue(5) ... Waiting for phase 1 ...
[1281730854.164616] AddQueue(4) done (time used: 0.002236)
AddQueue(4) ... Waiting for phase 1 ...
[1281730854.165453] AddQueue(0) done (time used: 0.001370)
AddQueue(0) ... Waiting for phase 1 ...
[1281730854.165771] AddQueue(3) ... Adding queue 'q_queue_005000'
[1281730854.166363] AddQueue(2) ... Adding queue 'q_queue_000500'
[1281730854.167016] AddQueue(3) done (time used: 0.001245)
AddQueue(3) ... Waiting for phase 1 ...
[1281730854.167135] AddQueue(2) done (time used: 0.000772)
AddQueue(2) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 6 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(1) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
---> Removing lock
[1281730872.159513] AddQueue(2) ... Adding queue 'q_queue_000600'
[1281730872.160431] AddQueue(5) ... Adding queue 'q_queue_600000'
[1281730872.161379] AddQueue(1) ... Adding queue 'q_queue_000060'
[1281730872.162357] AddQueue(5) done (time used: 0.001926)
AddQueue(5) ... Waiting for phase 1 ...
[1281730872.162531] AddQueue(2) done (time used: 0.003018)
AddQueue(2) ... Waiting for phase 1 ...
[1281730872.162699] AddQueue(1) done (time used: 0.001320)
AddQueue(1) ... Waiting for phase 1 ...
[1281730872.162774] AddQueue(4) ... Adding queue 'q_queue_060000'
[1281730872.162952] AddQueue(0) ... Adding queue 'q_queue_000006'
[1281730872.163990] AddQueue(0) done (time used: 0.001038)
AddQueue(0) ... Waiting for phase 1 ...
[1281730872.164055] AddQueue(3) ... Adding queue 'q_queue_006000'
[1281730872.164615] AddQueue(3) done (time used: 0.000560)
AddQueue(3) ... Waiting for phase 1 ...
[1281730872.164704] AddQueue(4) done (time used: 0.001930)
AddQueue(4) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 7 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(2) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
---> Removing lock
[1281730890.162640] AddQueue(0) ... Adding queue 'q_queue_000007'
[1281730890.163562] AddQueue(4) ... Adding queue 'q_queue_070000'
[1281730890.164529] AddQueue(0) done (time used: 0.001889)
AddQueue(0) ... Waiting for phase 1 ...
[1281730890.164643] AddQueue(1) ... Adding queue 'q_queue_000070'
[1281730890.164866] AddQueue(5) ... Adding queue 'q_queue_700000'
[1281730890.165660] AddQueue(4) done (time used: 0.002098)
AddQueue(4) ... Waiting for phase 1 ...
[1281730890.165800] AddQueue(5) done (time used: 0.000934)
AddQueue(5) ... Waiting for phase 1 ...
[1281730890.165872] AddQueue(2) ... Adding queue 'q_queue_000700'
[1281730890.168876] AddQueue(2) done (time used: 0.003004)
AddQueue(2) ... Waiting for phase 1 ...
[1281730890.168976] AddQueue(1) done (time used: 0.004333)
AddQueue(1) ... Waiting for phase 1 ...
[1281730890.169028] AddQueue(3) ... Adding queue 'q_queue_007000'
[1281730890.169553] AddQueue(3) done (time used: 0.000525)
AddQueue(3) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 8 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(0) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
---> Removing lock
[1281730908.164828] AddQueue(1) ... Adding queue 'q_queue_000080'
[1281730908.164997] AddQueue(3) ... Adding queue 'q_queue_008000'
[1281730908.165667] AddQueue(5) ... Adding queue 'q_queue_800000'
[1281730908.166178] AddQueue(0) ... Adding queue 'q_queue_000008'
[1281730908.167496] AddQueue(4) ... Adding queue 'q_queue_080000'
[1281730908.168318] AddQueue(3) done (time used: 0.003321)
AddQueue(3) ... Waiting for phase 1 ...
[1281730908.168476] AddQueue(0) done (time used: 0.002298)
AddQueue(0) ... Waiting for phase 1 ...
[1281730908.168646] AddQueue(4) done (time used: 0.001150)
AddQueue(4) ... Waiting for phase 1 ...
[1281730908.168800] AddQueue(5) done (time used: 0.003133)
AddQueue(5) ... Waiting for phase 1 ...
[1281730908.169423] AddQueue(1) done (time used: 0.004595)
AddQueue(1) ... Waiting for phase 1 ...
[1281730908.169476] AddQueue(2) ... Adding queue 'q_queue_000800'
[1281730908.169962] AddQueue(2) done (time used: 0.000486)
AddQueue(2) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


===> Doing loop 9 of 10
---> Sleeping for another 15 seconds, to allow all threads to get ready for phase 2 ...
AddQueue(3) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(4) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
---> Removing lock
[1281730926.168189] AddQueue(4) ... Adding queue 'q_queue_090000'
[1281730926.169123] AddQueue(5) ... Adding queue 'q_queue_900000'
[1281730926.170089] AddQueue(4) done (time used: 0.001900)
AddQueue(4) ... Waiting for phase 1 ...
[1281730926.171932] AddQueue(3) ... Adding queue 'q_queue_009000'
[1281730926.172504] AddQueue(0) ... Adding queue 'q_queue_000009'
[1281730926.172886] AddQueue(5) done (time used: 0.003763)
AddQueue(5) ... Waiting for phase 1 ...
[1281730926.173000] AddQueue(3) done (time used: 0.001068)
AddQueue(3) ... Waiting for phase 1 ...
[1281730926.173099] AddQueue(1) ... Adding queue 'q_queue_000090'
[1281730926.174987] AddQueue(1) done (time used: 0.001888)
AddQueue(1) ... Waiting for phase 1 ...
[1281730926.175138] AddQueue(0) done (time used: 0.002634)
AddQueue(0) ... Waiting for phase 1 ...
[1281730926.175237] AddQueue(2) ... Adding queue 'q_queue_000900'
[1281730926.175968] AddQueue(2) done (time used: 0.000731)
AddQueue(2) ... Waiting for phase 1 ...
---> Waiting for all threads to get ready again
---> ready: 0


---> Exiting...
AddQueue(4) ... waiting for phase 2... 
AddQueue(5) ... waiting for phase 2... 
AddQueue(3) ... waiting for phase 2... 
AddQueue(1) ... waiting for phase 2... 
AddQueue(0) ... waiting for phase 2... 
AddQueue(2) ... waiting for phase 2... 
AddQueue(0) ... Disconnecting
AddQueue(1) ... Disconnecting
AddQueue(2) ... Disconnecting
AddQueue(3) ... Disconnecting
AddQueue(4) ... Disconnecting
AddQueue(5) ... Disconnecting

Comment 1 Frantisek Reznicek 2010-08-23 10:23:49 UTC
Reproducibility is higher now as I succeeded to trigger the issue again on RHEL 5.5 x86_64 on different machine (with same package set).

How reproducible:
40% on RHEL 5.5 x86_64

Comment 2 Kim van der Riet 2010-08-27 13:42:13 UTC
After examining the logs with trace-level logging enabled, it seems these failures are connected with qmf traffic (if I understand Gordon's analysis correctly), specifically when a qmf session is closed. In all cases I reproduced, the frame size 0 error was immediately preceded by a SessionCompletedBody with very large (corrupted) command completion values. In once case, this corrupted message also sparked an Execution exception on account of bad session completion information.

2010-08-23 19:14:01 trace RECV [127.0.0.1:57357]: Frame[BEbe; channel=1; {SessionCompletedBody: commands={ [0,10] [65537,50462837] }; }]
2010-08-23 19:14:01 debug anonymous.rhts_mrg10.lab.bos.redhat.com.11352_3_{(2010, 8, 23, 19, 14, 1, 0, 235, 1)}: sender marked completed: { [0,10] [65537,50462837] }
2010-08-23 19:14:01 debug anonymous.rhts_mrg10.lab.bos.redhat.com.11352_3_{(2010, 8, 23, 19, 14, 1, 0, 235, 1)}: sender confirmed point moved to (11+0)
2010-08-23 19:14:01 debug Exception constructed: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)
2010-08-23 19:14:01 error framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)


I commented out the management client from the test code (rhtslib_management.py:58, 59, 78) and reran the test. Although client connection timeout errors were observed in some runs, no further occurrence of the "Frame size 0" error was seen.

Since the client is using the old API which has known issues with thread safety, and since this is a multi-threaded client, I would suggest that in this case, the broker is not at fault, and is behaving correctly given the corrupted frame is likely being sent from the Python client.

I would propose changing this bug to state CLOSED/NOTABUG unless there are any objections or other considerations/tests.

Comment 3 Gordon Sim 2010-08-31 08:24:46 UTC
Thread-safety issue in old management client is now tracked by bug 628856 (whether or not this is deemed worth fixing given it is now superceded is another question). I'm going to mark this closed as suggested by Kim.