Bug 472937 - TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid0x2aac2c9b0ee0) (MessageStoreImpl.cpp:1079)
Summary: TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() thr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.1.1
: ---
Assignee: Kim van der Riet
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-25 18:03 UTC by Gordon Sim
Modified: 2009-04-21 16:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:16:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test script I was using (1.18 KB, text/plain)
2008-11-25 18:03 UTC, Gordon Sim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Gordon Sim 2008-11-25 18:03:01 UTC
Created attachment 324639 [details]
test script I was using

ERROR: test_recover (tests_0-10.dtx.DtxTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gordon/qpid/python/tests_0-10/dtx.py", line 655, in test_recover
    xids = session.dtx_recover().in_doubt
  File "/home/gordon/qpid/python/qpid/invoker.py", line 27, in <lambda>
    method = lambda *args, **kwargs: self.invoke(resolved, args, kwargs)
  File "/home/gordon/qpid/python/qpid/session.py", line 158, in invoke
    return self.do_invoke(type, args, kwargs)
  File "/home/gordon/qpid/python/qpid/session.py", line 213, in do_invoke
    return result.get(self.timeout)
  File "/home/gordon/qpid/python/qpid/datatypes.py", line 257, in get
    raise self.exception(self._error)
SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND:
Key not found in map. (xid=rhm-tid0x2aac2c9b0ee0) (MessageStoreImpl.cpp:1079)')

I ran a couple of txtests concurrently in a loop with the python tests and got the above error back after a couple of iterations on the python side.

Comment 1 Kim van der Riet 2008-12-09 17:59:57 UTC
Fixed in BZ 2954.

A race condition was found which the load test of this script uncovered.

QA: This error is easy to reproduce using the above script, particularly if the python test is modified to just run dtx.DtxTests.test_recover. (This can be done by editing qpid/cpp/src/tests/python_tests or setting $PYTHON_TESTS appropriately.)

Comment 2 Kim van der Riet 2008-12-09 18:02:31 UTC
er... the above should have read:

Fixed in svn r.2954

Comment 4 David Sommerseth 2009-01-12 15:11:25 UTC
Ran the test on ibm-mongoose.rhts.bos.redhat.com using these packages:

python-qpid-0.4.733051-1.el5
rhm-0.4.3036-2.el5
qpidd-0.4.732838-1.el5
qpidc-perftest-0.4.732838-1.el5

Modified the test script attached in this bz by adding 

    export PYTHON_TESTS="tests_0-10.dtx.DtxTests.test_recover"

in the beginning of the script and just changing the paths for perftest and txtest binaries.  I also checked out cpp/src/tests, python and specs from SVN to have the needed test files for running this test.  txtest and perftest binaries are from the corresponding qpidc-perftest package.


======================================================================
ERROR: test_recover (tests_0-10.dtx.DtxTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/qpid/python/tests_0-10/dtx.py", line 655, in test_recover
    xids = session.dtx_recover().in_doubt
  File "/root/qpid/python/qpid/generator.py", line 25, in <lambda>
    method = lambda self, *args, **kwargs: self.invoke(inst, args, kwargs)
  File "/root/qpid/python/qpid/session.py", line 143, in invoke
    return self.do_invoke(type, args, kwargs)
  File "/root/qpid/python/qpid/session.py", line 198, in do_invoke
    return result.get(self.timeout)
  File "/root/qpid/python/qpid/datatypes.py", line 257, in get
    raise self.exception(self._error)
SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list_nolock() threw JERR_MAP_NOTFOUND: Key not found in map. (xi
d=rhm-tid0x2aaab0018940) (MessageStoreImpl.cpp:1079)')
======================================================================

The test is performed on MRG/M-1.0.1 and RC packages of MRG/M-1.1.

Comment 5 Kim van der Riet 2009-01-14 13:01:16 UTC
A further race condition was found in MessageStoreImpl::readTplStore() in which an XID read from the while loop would be removed by another thread by the time the execution reached the tmap.get_tdata_list(xid) call within the loop.

A pragmatic fix of catching and ignoring the error was chosen in this case rather than attempting the complex, error-prone and possibly also performance degrading route of locking the transaction map from other threads while the entire map is read for this operation. This call is made infrequently and is not part of the regular message handling code path.

Fixed in r.3039

QA - a long test as described above without error should prove this bug is fixed.

Comment 6 Frantisek Reznicek 2009-01-30 15:49:57 UTC
The issue has been fixed as proved on RHEL4.7/5.3 i386/x86_64 on packages
qpidd-0.4.738274-1, rhm-0.4.3075-3.
tests_0-10.dtx.DtxTests.test_recover is passed during long term test.

->VERIFIED

Comment 8 errata-xmlrpc 2009-04-21 16:16:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.