Bug 460116 - txtest failures when broker killed during transfer phase (RHEL 4)
Summary: txtest failures when broker killed during transfer phase (RHEL 4)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.0.1
: ---
Assignee: Kim van der Riet
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On: 458053
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-08-26 09:02 UTC by Gordon Sim
Modified: 2011-08-12 16:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-06 18:58:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0867 0 normal SHIPPED_LIVE Red Hat Enterprise MRG bug fix and enhancement update 2008-10-06 18:58:51 UTC

Description Gordon Sim 2008-08-26 09:02:00 UTC
+++ This bug was initially created as a clone of Bug #458053 +++

Using store r2258 from release branch (and r680695 from qpid.0-10). 

1. Start broker
2. run txtest (e.g. txtest --messages-per-tx 100 --tx-count 100000 --total-messages 10000 --size 64 --queues 4)
3. after sometime 'kill -9 <broker-pid>
4. remove lock and restart broker
5. run check phase (e.g. txtest --messages-per-tx 100 --tx-count 100000 --total-messages 10000 --size 64 --queues 4 --check yes --init no --transfer no)

Expect all messages to be present. Sometimes messages are reported as missing, sometimes the following error occurs instead:

Queue tx-test-2: async_dequeue() failed: jexception 0x0b01 txn_map::get_tdata_list() threw JERR_MAP_NOTFOUND: Key not found in map. (xid=) (BdbMessageStore.cpp:1246)

--- Additional comment from kim.vdriet on 2008-08-12 11:35:15 EDT ---

Several problems were responsible for this error:

1. The 1PC transactions were not being handled atomically across multiple queues. This was fixed by keeping 1PC txns in the prepared list and modifying the txn recovery logic to handle 1PC txns.

2. Message recovery did not correctly predict the outcome of messages which needed to be rolled forward/back because of incomplete multi-queue commits/aborts. The message recovery logic was reworked to extract the information it needs to make this determination from the journal enqueue and transaction maps. Some new accessors were added to class jcntl to allow for these operations.

In addition, some bugs were found in the python journal file checker jfile_chk.py which was used to analyze the journal files. These were fixed, and a new -a flag now performs transactional analysis on the journal and reports open transactions and locked records.

Fixed in r.2279.

--- Additional comment from kim.vdriet on 2008-08-12 13:58:01 EDT ---

After some testing, there are still occasional cases of lost messages when txtest is run in test mode. Reassigning.

--- Additional comment from kim.vdriet on 2008-08-13 15:29:09 EDT ---

Additional cases for prepared but not completed transactions found; also for non-prepared transactions which were not being correctly aborted at journal level.

r.2297

Comment 2 Frantisek Reznicek 2008-08-29 07:58:43 UTC
RHTS test developed (MRG/qpid_txtest_fails_bz458053).
Test results comming soon.

Comment 3 Frantisek Reznicek 2008-08-29 14:11:51 UTC
RHTS test MRG/qpid_txtest_fails_bz458053 proved that this issue is no longer
present.(See RHTS jobs 28113 and 28114 for details)

Comment 4 Frantisek Reznicek 2008-09-04 09:07:34 UTC
After few more automated tests (MRG/qpid_txtest_fails_bz458053 and
MRG/qpid_test_transaction_integrity) there is still less than percent of
failing cases.

MRG/qpid_test_transaction_integrity test shows it on
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4116052.

Please find test case log attached (on RHEL5 clone i.e. bz458053).


Moving VERIFIED to FAILS_QA.

P.S. Latest test case code can be found here:
http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG_Messaging/

Comment 5 Kim van der Riet 2008-09-18 18:43:00 UTC
Fixed, see https://bugzilla.redhat.com/show_bug.cgi?id=458053

Comment 6 Frantisek Reznicek 2008-09-24 15:01:56 UTC
RHTS automated tests (MRG/qpid_txtest_fails_bz458053 and
MRG/qpid_test_transaction_integrity) now prove that issue is gone. (->VERIFIED)

Comment 8 errata-xmlrpc 2008-10-06 18:58:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0867.html


Note You need to log in before you can comment on or make changes to this bug.