Bug 472937 - TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid0x2aac2c9b0ee0) (MessageStoreImpl.cpp:1079)
TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() thr...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.0
All Linux
urgent Severity urgent
: 1.1.1
: ---
Assigned To: Kim van der Riet
Kim van der Riet
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-25 13:03 EST by Gordon Sim
Modified: 2009-04-21 12:16 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:16:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test script I was using (1.18 KB, text/plain)
2008-11-25 13:03 EST, Gordon Sim
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 12:15:50 EDT

  None (edit)
Description Gordon Sim 2008-11-25 13:03:01 EST
Created attachment 324639 [details]
test script I was using

ERROR: test_recover (tests_0-10.dtx.DtxTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gordon/qpid/python/tests_0-10/dtx.py", line 655, in test_recover
    xids = session.dtx_recover().in_doubt
  File "/home/gordon/qpid/python/qpid/invoker.py", line 27, in <lambda>
    method = lambda *args, **kwargs: self.invoke(resolved, args, kwargs)
  File "/home/gordon/qpid/python/qpid/session.py", line 158, in invoke
    return self.do_invoke(type, args, kwargs)
  File "/home/gordon/qpid/python/qpid/session.py", line 213, in do_invoke
    return result.get(self.timeout)
  File "/home/gordon/qpid/python/qpid/datatypes.py", line 257, in get
    raise self.exception(self._error)
SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND:
Key not found in map. (xid=rhm-tid0x2aac2c9b0ee0) (MessageStoreImpl.cpp:1079)')

I ran a couple of txtests concurrently in a loop with the python tests and got the above error back after a couple of iterations on the python side.
Comment 1 Kim van der Riet 2008-12-09 12:59:57 EST
Fixed in BZ 2954.

A race condition was found which the load test of this script uncovered.

QA: This error is easy to reproduce using the above script, particularly if the python test is modified to just run dtx.DtxTests.test_recover. (This can be done by editing qpid/cpp/src/tests/python_tests or setting $PYTHON_TESTS appropriately.)
Comment 2 Kim van der Riet 2008-12-09 13:02:31 EST
er... the above should have read:

Fixed in svn r.2954
Comment 4 David Sommerseth 2009-01-12 10:11:25 EST
Ran the test on ibm-mongoose.rhts.bos.redhat.com using these packages:

python-qpid-0.4.733051-1.el5
rhm-0.4.3036-2.el5
qpidd-0.4.732838-1.el5
qpidc-perftest-0.4.732838-1.el5

Modified the test script attached in this bz by adding 

    export PYTHON_TESTS="tests_0-10.dtx.DtxTests.test_recover"

in the beginning of the script and just changing the paths for perftest and txtest binaries.  I also checked out cpp/src/tests, python and specs from SVN to have the needed test files for running this test.  txtest and perftest binaries are from the corresponding qpidc-perftest package.


======================================================================
ERROR: test_recover (tests_0-10.dtx.DtxTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/qpid/python/tests_0-10/dtx.py", line 655, in test_recover
    xids = session.dtx_recover().in_doubt
  File "/root/qpid/python/qpid/generator.py", line 25, in <lambda>
    method = lambda self, *args, **kwargs: self.invoke(inst, args, kwargs)
  File "/root/qpid/python/qpid/session.py", line 143, in invoke
    return self.do_invoke(type, args, kwargs)
  File "/root/qpid/python/qpid/session.py", line 198, in do_invoke
    return result.get(self.timeout)
  File "/root/qpid/python/qpid/datatypes.py", line 257, in get
    raise self.exception(self._error)
SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list_nolock() threw JERR_MAP_NOTFOUND: Key not found in map. (xi
d=rhm-tid0x2aaab0018940) (MessageStoreImpl.cpp:1079)')
======================================================================

The test is performed on MRG/M-1.0.1 and RC packages of MRG/M-1.1.
Comment 5 Kim van der Riet 2009-01-14 08:01:16 EST
A further race condition was found in MessageStoreImpl::readTplStore() in which an XID read from the while loop would be removed by another thread by the time the execution reached the tmap.get_tdata_list(xid) call within the loop.

A pragmatic fix of catching and ignoring the error was chosen in this case rather than attempting the complex, error-prone and possibly also performance degrading route of locking the transaction map from other threads while the entire map is read for this operation. This call is made infrequently and is not part of the regular message handling code path.

Fixed in r.3039

QA - a long test as described above without error should prove this bug is fixed.
Comment 6 Frantisek Reznicek 2009-01-30 10:49:57 EST
The issue has been fixed as proved on RHEL4.7/5.3 i386/x86_64 on packages
qpidd-0.4.738274-1, rhm-0.4.3075-3.
tests_0-10.dtx.DtxTests.test_recover is passed during long term test.

->VERIFIED
Comment 8 errata-xmlrpc 2009-04-21 12:16:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.