Bug 450280 - transaction failure: txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map.
transaction failure: txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND...
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
urgent Severity high
: ---
: ---
Assigned To: Gordon Sim
Kim van der Riet
Depends On:
  Show dependency treegraph
Reported: 2008-06-06 09:17 EDT by Gordon Sim
Modified: 2009-05-07 16:09 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-12-02 11:06:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2008-06-06 09:17:15 EDT
2008-jun-06 09:06:10 error Commit failed with exception: Exception: Error
commitjexception 0x0b01 txn_map::get_remove_tdata_list() threw
JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid\97) (TxnCtxt.h:87)

Transactional publish/consume test (from java client) with two messages
sent/consumed per txn.
Comment 1 Gordon Sim 2008-06-10 04:53:12 EDT
Similar sort of issue found using c++ txtest on file-18:

./src/tests/txtest --queues 5 --messages-per-tx 2 --total-messages 2000
--tx-count 100000 --size 512

led to:

terminate called after throwing an instance of 'rhm::journal::jexception'
  what():  jexception 0x0b01 wmgr::get_events() threw JERR_MAP_NOTFOUND: Key not
found in map. (_txn_pending_set: commit xid="rhm-tid�")

#0  0x000000300b030055 in raise () from /lib64/libc.so.6
#1  0x000000300b031af0 in abort () from /lib64/libc.so.6
#2  0x00000030140bec34 in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib64/libstdc++.so.6
#3  0x00000030140bcdf6 in std::set_unexpected () from /usr/lib64/libstdc++.so.6
#4  0x00000030140bce23 in std::terminate () from /usr/lib64/libstdc++.so.6
#5  0x00000030140bcf0a in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00002aaaab988524 in rhm::journal::wmgr::get_events (
    this=<value optimized out>, state=rhm::journal::pmgr::UNUSED)
    at jrnl/wmgr.cpp:848
#7  0x00002aaaab967895 in rhm::journal::jcntl::get_wr_events (this=0x1d73b078)
    at jrnl/jcntl.cpp:399
#8  0x00002aaaab9526f0 in rhm::bdbstore::JournalImpl::getEventsFire (
    this=0x1d73b070) at JournalImpl.cpp:422
#9  0x00002aaaab952782 in rhm::bdbstore::GetEventsFireEvent::fire (
    this=0x1d73ae30) at JournalImpl.cpp:45
#10 0x00002aaaaabd0e47 in qpid::broker::Timer::run ()
   from /home/gordon/qpid/cpp/src/.libs/libqpidbroker.so.0
#11 0x00002aaaaaf84e3a in qpid::sys::Thread::runRunnable ()
   from /home/gordon/qpid/cpp/src/.libs/libqpidcommon.so.0
#12 0x000000300bc062f7 in start_thread () from /lib64/libpthread.so.0
#13 0x000000300b0ce85d in clone () from /lib64/libc.so.6
Comment 2 Gordon Sim 2008-06-10 05:06:05 EDT
2 messages appeared to be lost after recovering from the above crash.
Comment 3 Gordon Sim 2008-06-10 07:17:25 EDT
There are 2 issues with the manner in which the transaction id that is the 'xid'
in these messages is constructed:

(1) it uses a static counter protected by an instance specific lock which means
that the increment is unsafe

(2) it tries to append a count to a string via (essentially) tid += count which
is also not safe

Fixing these appears to solve the issues I have been seeing though much more
testing is still needed to validate this.
Comment 4 Mike Bonnet 2008-06-19 23:53:20 EDT
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing

Note You need to log in before you can comment on or make changes to this bug.