Red Hat Bugzilla – Bug 450280
transaction failure: txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map.
Last modified: 2009-05-07 16:09:42 EDT
2008-jun-06 09:06:10 error Commit failed with exception: Exception: Error
commitjexception 0x0b01 txn_map::get_remove_tdata_list() threw
JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid\97) (TxnCtxt.h:87)
Transactional publish/consume test (from java client) with two messages
sent/consumed per txn.
Similar sort of issue found using c++ txtest on file-18:
./src/tests/txtest --queues 5 --messages-per-tx 2 --total-messages 2000
--tx-count 100000 --size 512
terminate called after throwing an instance of 'rhm::journal::jexception'
what(): jexception 0x0b01 wmgr::get_events() threw JERR_MAP_NOTFOUND: Key not
found in map. (_txn_pending_set: commit xid="rhm-tid�")
#0 0x000000300b030055 in raise () from /lib64/libc.so.6
#1 0x000000300b031af0 in abort () from /lib64/libc.so.6
#2 0x00000030140bec34 in __gnu_cxx::__verbose_terminate_handler ()
#3 0x00000030140bcdf6 in std::set_unexpected () from /usr/lib64/libstdc++.so.6
#4 0x00000030140bce23 in std::terminate () from /usr/lib64/libstdc++.so.6
#5 0x00000030140bcf0a in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6 0x00002aaaab988524 in rhm::journal::wmgr::get_events (
this=<value optimized out>, state=rhm::journal::pmgr::UNUSED)
#7 0x00002aaaab967895 in rhm::journal::jcntl::get_wr_events (this=0x1d73b078)
#8 0x00002aaaab9526f0 in rhm::bdbstore::JournalImpl::getEventsFire (
this=0x1d73b070) at JournalImpl.cpp:422
#9 0x00002aaaab952782 in rhm::bdbstore::GetEventsFireEvent::fire (
this=0x1d73ae30) at JournalImpl.cpp:45
#10 0x00002aaaaabd0e47 in qpid::broker::Timer::run ()
#11 0x00002aaaaaf84e3a in qpid::sys::Thread::runRunnable ()
#12 0x000000300bc062f7 in start_thread () from /lib64/libpthread.so.0
#13 0x000000300b0ce85d in clone () from /lib64/libc.so.6
2 messages appeared to be lost after recovering from the above crash.
There are 2 issues with the manner in which the transaction id that is the 'xid'
in these messages is constructed:
(1) it uses a static counter protected by an instance specific lock which means
that the increment is unsafe
(2) it tries to append a count to a string via (essentially) tid += count which
is also not safe
Fixing these appears to solve the issues I have been seeing though much more
testing is still needed to validate this.
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing