Red Hat Bugzilla – Bug 486418
qpidd+store The extra xids encountered after qpidd recovery from journal
Last modified: 2015-11-15 19:06:51 EST
Description of problem:
During journal testing very very rarely txtest --check ends with message: (ecode=1)
The following extra ids were encountered:
<follows list of extra messages found>
It happened in transaction integrity test part B where journal is not trashed before each run. It seems that xids are reused.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run RHTS qpid_test_transaction_integrity (part B)
Very rarely txtest finds extra messages after qpidd recovery.
No extra messages after qpidd recovery should be seen.
The data are stored in RHTS system
search 'The following extra ids were encountered:' in
https://rhts.redhat.com/testlogs/46435/157993/1320331/TESTOUT.log (rough log)
https://rhts.redhat.com/testlogs/46435/157993/1320331/qpidd_txtest.transcript.log (fine log)
corresponding journals are here:
Analysis of the journals shows that this bug occurs when a local transaction id (tid) is reused for a transaction after it was left incomplete in a previous test. The records from the previous test are discarded if there is no matching entry in the transaction prepared list (TPL). However, as soon as a transaction using the same tid is committed, recover will also include the records from the earlier test as the journal has no way of knowing if these were part of the same transaction.
The class TxnCtxt was using a string "tid-" followed by the memory address of itself as a quick and cheap (ie not costly in performance) tid. However, memory addresses can be reused in a pattern such that the same address is allocated on various tests.
The problem was solved by generating a genuine xid using ::uuid_generate() to create a new xid for each broker instance. A 64-bit counter is incremented and the value pre-pended to the uuid to create a final tid that is guaranteed unique without the expense of generating a new uuid for each transaction.
Fixed in r.3124.
QA: This bug cannot be reliably reproduced; thorough soak testing should verify that there is no recurrence.
The issue has been fixed, validated on RHEl 4.7 / 5.3 i386 / x86_64 on packages:
Fixed and verified; closing.