Bug 486418

Summary: qpidd+store The extra xids encountered after qpidd recovery from journal
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED CURRENTRELEASE QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: high    
Version: 1.1CC: esammons, jross
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-27 20:53:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frantisek Reznicek 2009-02-19 17:28:58 UTC
Description of problem:

During journal testing very very rarely txtest --check ends with message: (ecode=1)
The following extra ids were encountered:
<follows list of extra messages found>

It happened in transaction integrity test part B where journal is not trashed before each run. It seems that xids are reused.

Version-Release number of selected component (if applicable):
qpidd-0.4.743861-1.el4, rhm-0.4.3116-2.el4

How reproducible:
rarely (~3%)

Steps to Reproduce:
1. run RHTS qpid_test_transaction_integrity (part B)
  
Actual results:
Very rarely txtest finds extra messages after qpidd recovery.

Expected results:
No extra messages after qpidd recovery should be seen.

Additional info:

The data are stored in RHTS system
search 'The following extra ids were encountered:' in
https://rhts.redhat.com/testlogs/46435/157993/1320331/TESTOUT.log (rough log)
and in
https://rhts.redhat.com/testlogs/46435/157993/1320331/qpidd_txtest.transcript.log (fine log)
corresponding journals are here:
https://rhts.redhat.com/testlogs/46435/157993/1320331/qpidd_journal_b0020-0023.tar.bz2

Comment 1 Kim van der Riet 2009-02-19 19:02:28 UTC
Analysis of the journals shows that this bug occurs when a local transaction id (tid) is reused for a transaction after it was left incomplete in a previous test. The records from the previous test are discarded if there is no matching entry in the transaction prepared list (TPL). However, as soon as a transaction using the same tid is committed, recover will also include the records from the earlier test as the journal has no way of knowing if these were part of the same transaction.

The class TxnCtxt was using a string "tid-" followed by the memory address of itself as a quick and cheap (ie not costly in performance) tid. However, memory addresses can be reused in a pattern such that the same address is allocated on various tests.

The problem was solved by generating a genuine xid using ::uuid_generate() to create a new xid for each broker instance. A 64-bit counter is incremented and the value pre-pended to the uuid to create a final tid that is guaranteed unique without the expense of generating a new uuid for each transaction.

Fixed in r.3124.

QA: This bug cannot be reliably reproduced; thorough soak testing should verify that there is no recurrence.

Comment 2 Frantisek Reznicek 2009-03-09 13:11:51 UTC
The issue has been fixed, validated on RHEl 4.7 / 5.3 i386 / x86_64 on packages:
qpidd-0.4.750054-1.el5, rhm-0.4.3138-2.el5.

->VERIFIED

Comment 3 Justin Ross 2011-06-27 20:53:59 UTC
Fixed and verified; closing.