Bug 450280 - transaction failure: txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map.
Summary: transaction failure: txn_map::get_remove_tdata_list() threw JERR_MAP_NOTFOUND...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
: ---
Assignee: Gordon Sim
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-06 13:17 UTC by Gordon Sim
Modified: 2009-05-07 20:09 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-12-02 16:06:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gordon Sim 2008-06-06 13:17:15 UTC
2008-jun-06 09:06:10 error Commit failed with exception: Exception: Error
commitjexception 0x0b01 txn_map::get_remove_tdata_list() threw
JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid\97) (TxnCtxt.h:87)

Transactional publish/consume test (from java client) with two messages
sent/consumed per txn.

Comment 1 Gordon Sim 2008-06-10 08:53:12 UTC
Similar sort of issue found using c++ txtest on file-18:

./src/tests/txtest --queues 5 --messages-per-tx 2 --total-messages 2000
--tx-count 100000 --size 512

led to:

terminate called after throwing an instance of 'rhm::journal::jexception'
  what():  jexception 0x0b01 wmgr::get_events() threw JERR_MAP_NOTFOUND: Key not
found in map. (_txn_pending_set: commit xid="rhm-tid�")

#0  0x000000300b030055 in raise () from /lib64/libc.so.6
#1  0x000000300b031af0 in abort () from /lib64/libc.so.6
#2  0x00000030140bec34 in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib64/libstdc++.so.6
#3  0x00000030140bcdf6 in std::set_unexpected () from /usr/lib64/libstdc++.so.6
#4  0x00000030140bce23 in std::terminate () from /usr/lib64/libstdc++.so.6
#5  0x00000030140bcf0a in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00002aaaab988524 in rhm::journal::wmgr::get_events (
    this=<value optimized out>, state=rhm::journal::pmgr::UNUSED)
    at jrnl/wmgr.cpp:848
#7  0x00002aaaab967895 in rhm::journal::jcntl::get_wr_events (this=0x1d73b078)
    at jrnl/jcntl.cpp:399
#8  0x00002aaaab9526f0 in rhm::bdbstore::JournalImpl::getEventsFire (
    this=0x1d73b070) at JournalImpl.cpp:422
#9  0x00002aaaab952782 in rhm::bdbstore::GetEventsFireEvent::fire (
    this=0x1d73ae30) at JournalImpl.cpp:45
#10 0x00002aaaaabd0e47 in qpid::broker::Timer::run ()
   from /home/gordon/qpid/cpp/src/.libs/libqpidbroker.so.0
#11 0x00002aaaaaf84e3a in qpid::sys::Thread::runRunnable ()
   from /home/gordon/qpid/cpp/src/.libs/libqpidcommon.so.0
#12 0x000000300bc062f7 in start_thread () from /lib64/libpthread.so.0
#13 0x000000300b0ce85d in clone () from /lib64/libc.so.6


Comment 2 Gordon Sim 2008-06-10 09:06:05 UTC
2 messages appeared to be lost after recovering from the above crash.

Comment 3 Gordon Sim 2008-06-10 11:17:25 UTC
There are 2 issues with the manner in which the transaction id that is the 'xid'
in these messages is constructed:

(1) it uses a static counter protected by an instance specific lock which means
that the increment is unsafe

(2) it tries to append a count to a string via (essentially) tid += count which
is also not safe

Fixing these appears to solve the issues I have been seeing though much more
testing is still needed to validate this.

Comment 4 Mike Bonnet 2008-06-20 03:53:20 UTC
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing


Note You need to log in before you can comment on or make changes to this bug.