Hide Forgot
Description of problem: If more subscribers are bound to a flow-to-disk (FTD) queue, the queue may be stuck after raising JERR__UNEXPRESPONSE / RHM_IORES_EMPTY exception. Version-Release number of selected component (if applicable): # rpm -qa | grep qpid qpid-java-example-0.10-9.el6.noarch qpid-cpp-client-devel-docs-0.10-8.el6_1.noarch python-qpid-0.12-1.el6.noarch qpid-tools-0.12-2.el6.noarch qpid-cpp-client-ssl-0.12-6.el6.x86_64 qpid-cpp-client-0.12-6.el6.x86_64 qpid-qmf-0.12-6.el6.x86_64 python-qpid-qmf-0.12-6.el6.x86_64 qpid-cpp-server-devel-0.12-6.el6.x86_64 qpid-cpp-server-store-0.12-6.el6.x86_64 qpid-java-client-0.10-9.el6.noarch qpid-java-common-0.10-9.el6.noarch qpid-cpp-server-0.12-6.el6.x86_64 qpid-cpp-client-devel-0.12-6.el6.x86_64 qpid-cpp-server-xml-0.12-6.el6.x86_64 # How reproducible: 100% within few minutes Steps to Reproduce: Run this script: ./drain "ftd_queue; { create:always, node:{ type:queue, durable:true, x-declare:{ arguments:{ qpid.policy_type:flow_to_disk, qpid.max_count:100, qpid.max_size:0, qpid.flow_stop_count:0, qpid.flow_stop_size:0, qpid.file_count:4, qpid.file_size:512 } } }}" while true; do echo "$(date): starting next round" ./spout -c 10000 --content "some content some content some content some content some content some content some content some content some content some content some content some content some content some content some content some content some content some content" "ftd_queue" & ./drain -f -c 5000 "ftd_queue" & ./drain -f -c 5000 "ftd_queue" & wait done (the script creates a FTD queue, and in parallel sends there messages on 1 connection and drains messages on 2 parallel connections - every loop sends+drains 10k messages) Actual results: Few issues seen: 1) After a while, drain program raises exception: 2012-01-09 16:04:14 warning Broker closed connection: 320, Queue ftd_queue: loadContent() failed: jexception 0x0108 JournalImpl::loadMsgContent() threw JERR__UNEXPRESPONSE: Unexpected response to call or event. (read_data_record() returned RHM_IORES_EMPTY) (MessageStoreImpl.cpp:1249) connection-forced: Queue ftd_queue: loadContent() failed: jexception 0x0108 JournalImpl::loadMsgContent() threw JERR__UNEXPRESPONSE: Unexpected response to call or event. (read_data_record() returned RHM_IORES_EMPTY) (MessageStoreImpl.cpp:1249) 2) Sometimes (easy to reproduce by repeating the cycle from the script), the queue becomes "stuck", having messages in it (per qpid-stat -q) but a drain fails with the same exception even during fetching a single message. The queue is then absolutely useless. 3) Rarely, the bug even segfaults qpidd process - coredump to be added Expected results: No such behavior (either 1, 2 or 3 from above) Additional info:
Created attachment 551581 [details] coredump of segfault (see actual results, point 3) /var/log/messages commented the segfault with these logs: Jan 9 14:46:19 pmoravec-mrg-train6 qpidd[16887]: 2012-01-09 14:46:19 error Connection 127.0.0.1:5672-127.0.0.1:58718 closed by error: Queue ftd_queue: loadContent() failed: jexception 0x0901 rmgr::read() threw JERR_RMGR_RIDMISMATCH: RID mismatch between current record and dtok RID (rid=0xcd; dtok_rid=0xcc; dtok_id=0x7) (MessageStoreImpl.cpp:1249)(320) Jan 9 14:46:40 pmoravec-mrg-train6 kernel: qpidd[16896] general protection ip:3863833fc4 sp:7f483b821930 error:0 in libc-2.12.so[3863800000+197000] Jan 9 14:46:41 pmoravec-mrg-train6 abrt[17432]: saved core dump of pid 16887 (/usr/sbin/qpidd) to /var/spool/abrt/ccpp-1326116800-16887.new/coredump (56401920 bytes) Jan 9 14:46:41 pmoravec-mrg-train6 abrtd: Directory 'ccpp-1326116800-16887' creation detected Jan 9 14:46:41 pmoravec-mrg-train6 abrtd: New crash /var/spool/abrt/ccpp-1326116800-16887, processing
*** This bug has been marked as a duplicate of bug 769828 ***