Created attachment 324639 [details] test script I was using ERROR: test_recover (tests_0-10.dtx.DtxTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gordon/qpid/python/tests_0-10/dtx.py", line 655, in test_recover xids = session.dtx_recover().in_doubt File "/home/gordon/qpid/python/qpid/invoker.py", line 27, in <lambda> method = lambda *args, **kwargs: self.invoke(resolved, args, kwargs) File "/home/gordon/qpid/python/qpid/session.py", line 158, in invoke return self.do_invoke(type, args, kwargs) File "/home/gordon/qpid/python/qpid/session.py", line 213, in do_invoke return result.get(self.timeout) File "/home/gordon/qpid/python/qpid/datatypes.py", line 257, in get raise self.exception(self._error) SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map. (xid=rhm-tid0x2aac2c9b0ee0) (MessageStoreImpl.cpp:1079)') I ran a couple of txtests concurrently in a loop with the python tests and got the above error back after a couple of iterations on the python side.
Fixed in BZ 2954. A race condition was found which the load test of this script uncovered. QA: This error is easy to reproduce using the above script, particularly if the python test is modified to just run dtx.DtxTests.test_recover. (This can be done by editing qpid/cpp/src/tests/python_tests or setting $PYTHON_TESTS appropriately.)
er... the above should have read: Fixed in svn r.2954
Ran the test on ibm-mongoose.rhts.bos.redhat.com using these packages: python-qpid-0.4.733051-1.el5 rhm-0.4.3036-2.el5 qpidd-0.4.732838-1.el5 qpidc-perftest-0.4.732838-1.el5 Modified the test script attached in this bz by adding export PYTHON_TESTS="tests_0-10.dtx.DtxTests.test_recover" in the beginning of the script and just changing the paths for perftest and txtest binaries. I also checked out cpp/src/tests, python and specs from SVN to have the needed test files for running this test. txtest and perftest binaries are from the corresponding qpidc-perftest package. ====================================================================== ERROR: test_recover (tests_0-10.dtx.DtxTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/qpid/python/tests_0-10/dtx.py", line 655, in test_recover xids = session.dtx_recover().in_doubt File "/root/qpid/python/qpid/generator.py", line 25, in <lambda> method = lambda self, *args, **kwargs: self.invoke(inst, args, kwargs) File "/root/qpid/python/qpid/session.py", line 143, in invoke return self.do_invoke(type, args, kwargs) File "/root/qpid/python/qpid/session.py", line 198, in do_invoke return result.get(self.timeout) File "/root/qpid/python/qpid/datatypes.py", line 257, in get raise self.exception(self._error) SessionException: (501, u'TPL recoverTplStore() failed: jexception 0x0b01 txn_map::get_tdata_list_nolock() threw JERR_MAP_NOTFOUND: Key not found in map. (xi d=rhm-tid0x2aaab0018940) (MessageStoreImpl.cpp:1079)') ====================================================================== The test is performed on MRG/M-1.0.1 and RC packages of MRG/M-1.1.
A further race condition was found in MessageStoreImpl::readTplStore() in which an XID read from the while loop would be removed by another thread by the time the execution reached the tmap.get_tdata_list(xid) call within the loop. A pragmatic fix of catching and ignoring the error was chosen in this case rather than attempting the complex, error-prone and possibly also performance degrading route of locking the transaction map from other threads while the entire map is read for this operation. This call is made infrequently and is not part of the regular message handling code path. Fixed in r.3039 QA - a long test as described above without error should prove this bug is fixed.
The issue has been fixed as proved on RHEL4.7/5.3 i386/x86_64 on packages qpidd-0.4.738274-1, rhm-0.4.3075-3. tests_0-10.dtx.DtxTests.test_recover is passed during long term test. ->VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html