2008-jun-06 09:06:10 error Commit failed with exception: Exception: Error commitjexception 0x0b01 txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid\97) (TxnCtxt.h:87) Transactional publish/consume test (from java client) with two messages sent/consumed per txn.
Similar sort of issue found using c++ txtest on file-18: ./src/tests/txtest --queues 5 --messages-per-tx 2 --total-messages 2000 --tx-count 100000 --size 512 led to: terminate called after throwing an instance of 'rhm::journal::jexception' what(): jexception 0x0b01 wmgr::get_events() threw JERR_MAP_NOTFOUND: Key not found in map. (_txn_pending_set: commit xid="rhm-tid�") #0 0x000000300b030055 in raise () from /lib64/libc.so.6 #1 0x000000300b031af0 in abort () from /lib64/libc.so.6 #2 0x00000030140bec34 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib64/libstdc++.so.6 #3 0x00000030140bcdf6 in std::set_unexpected () from /usr/lib64/libstdc++.so.6 #4 0x00000030140bce23 in std::terminate () from /usr/lib64/libstdc++.so.6 #5 0x00000030140bcf0a in __cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00002aaaab988524 in rhm::journal::wmgr::get_events ( this=<value optimized out>, state=rhm::journal::pmgr::UNUSED) at jrnl/wmgr.cpp:848 #7 0x00002aaaab967895 in rhm::journal::jcntl::get_wr_events (this=0x1d73b078) at jrnl/jcntl.cpp:399 #8 0x00002aaaab9526f0 in rhm::bdbstore::JournalImpl::getEventsFire ( this=0x1d73b070) at JournalImpl.cpp:422 #9 0x00002aaaab952782 in rhm::bdbstore::GetEventsFireEvent::fire ( this=0x1d73ae30) at JournalImpl.cpp:45 #10 0x00002aaaaabd0e47 in qpid::broker::Timer::run () from /home/gordon/qpid/cpp/src/.libs/libqpidbroker.so.0 #11 0x00002aaaaaf84e3a in qpid::sys::Thread::runRunnable () from /home/gordon/qpid/cpp/src/.libs/libqpidcommon.so.0 #12 0x000000300bc062f7 in start_thread () from /lib64/libpthread.so.0 #13 0x000000300b0ce85d in clone () from /lib64/libc.so.6
2 messages appeared to be lost after recovering from the above crash.
There are 2 issues with the manner in which the transaction id that is the 'xid' in these messages is constructed: (1) it uses a static counter protected by an instance specific lock which means that the increment is unsafe (2) it tries to append a count to a string via (essentially) tid += count which is also not safe Fixing these appears to solve the issues I have been seeing though much more testing is still needed to validate this.
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing