Bug 486418 - qpidd+store The extra xids encountered after qpidd recovery from journal
Summary: qpidd+store The extra xids encountered after qpidd recovery from journal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
high
high
Target Milestone: 1.1.1
: ---
Assignee: Kim van der Riet
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-19 17:28 UTC by Frantisek Reznicek
Modified: 2015-11-16 00:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 20:53:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Frantisek Reznicek 2009-02-19 17:28:58 UTC
Description of problem:

During journal testing very very rarely txtest --check ends with message: (ecode=1)
The following extra ids were encountered:
<follows list of extra messages found>

It happened in transaction integrity test part B where journal is not trashed before each run. It seems that xids are reused.

Version-Release number of selected component (if applicable):
qpidd-0.4.743861-1.el4, rhm-0.4.3116-2.el4

How reproducible:
rarely (~3%)

Steps to Reproduce:
1. run RHTS qpid_test_transaction_integrity (part B)
  
Actual results:
Very rarely txtest finds extra messages after qpidd recovery.

Expected results:
No extra messages after qpidd recovery should be seen.

Additional info:

The data are stored in RHTS system
search 'The following extra ids were encountered:' in
https://rhts.redhat.com/testlogs/46435/157993/1320331/TESTOUT.log (rough log)
and in
https://rhts.redhat.com/testlogs/46435/157993/1320331/qpidd_txtest.transcript.log (fine log)
corresponding journals are here:
https://rhts.redhat.com/testlogs/46435/157993/1320331/qpidd_journal_b0020-0023.tar.bz2

Comment 1 Kim van der Riet 2009-02-19 19:02:28 UTC
Analysis of the journals shows that this bug occurs when a local transaction id (tid) is reused for a transaction after it was left incomplete in a previous test. The records from the previous test are discarded if there is no matching entry in the transaction prepared list (TPL). However, as soon as a transaction using the same tid is committed, recover will also include the records from the earlier test as the journal has no way of knowing if these were part of the same transaction.

The class TxnCtxt was using a string "tid-" followed by the memory address of itself as a quick and cheap (ie not costly in performance) tid. However, memory addresses can be reused in a pattern such that the same address is allocated on various tests.

The problem was solved by generating a genuine xid using ::uuid_generate() to create a new xid for each broker instance. A 64-bit counter is incremented and the value pre-pended to the uuid to create a final tid that is guaranteed unique without the expense of generating a new uuid for each transaction.

Fixed in r.3124.

QA: This bug cannot be reliably reproduced; thorough soak testing should verify that there is no recurrence.

Comment 2 Frantisek Reznicek 2009-03-09 13:11:51 UTC
The issue has been fixed, validated on RHEl 4.7 / 5.3 i386 / x86_64 on packages:
qpidd-0.4.750054-1.el5, rhm-0.4.3138-2.el5.

->VERIFIED

Comment 3 Justin Ross 2011-06-27 20:53:59 UTC
Fixed and verified; closing.


Note You need to log in before you can comment on or make changes to this bug.