Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1060202

Summary: Set timeout for every DTX transaction
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Leonid Zhaldybin <lzhaldyb>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: esammons, iboverma, jross, lzhaldyb, pmoravec, sauchter, vhubeika
Target Milestone: 3.0Keywords: EasyFix, Improvement, Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-0.22-35 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-24 15:10:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 785156    
Attachments:
Description Flags
Patch for qpid-txtest to create orphaned DTX transactions
none
Proposed patch none

Description Pavel Moravec 2014-01-31 12:50:12 UTC
Description of problem:
If an external Transaction Manager (TM) prepares a DTX transaction but forgets, due to any reason, to commit or abort it, tpl journal has an orphaned enqueue record forever (that in legacystore causes enqueue capacity threshold exception after a while, preventing _any_ transaction to commit/abort).

To prevent such orphaned XID entries in tpl, every incoming DTX transaction should have a default timeout set (while dtx.set-timeout AMQP 0-10 primitive changes it).

The timeout should be broker-wide parameter configurable via --dtx-default-timeout option.


Version-Release number of selected component (if applicable):
any (incl upstream 0.26)


How reproducible:
100%


Steps to Reproduce:
1. Mimic an external TM that prepares a DTX but never commits or aborts it - use e.g. _modified_ qpid-txtest (patch attached):
qpid-txtest --queues=1 --total-messages=1 --dtx=1 --dtx-commit=no

2. /usr/libexec/qpid/store_chk /var/lib/qpidd/rhm/tpl -b tpl


Actual results:
tpl journal keeps the unfinished transaction forever


Expected results:
sleeping for dtx-default-timeout, the transaction should be gone


Additional info:

Comment 1 Pavel Moravec 2014-01-31 12:51:22 UTC
Created attachment 857827 [details]
Patch for qpid-txtest to create orphaned DTX transactions

Necessary for reproducer.

Comment 2 Pavel Moravec 2014-01-31 15:13:23 UTC
Created attachment 857888 [details]
Proposed patch

--dtx-default-timeout option added with default value 3600 seconds.

Tried the reproducer on broker with --dtx-default-timeout=60, and the transaction was gone after one minute.

Comment 4 Pavel Moravec 2014-02-05 10:00:09 UTC
Committed to upstream as r1564694.

Comment 7 Pavel Moravec 2014-03-25 17:03:13 UTC
1) store_chk returning "Operation on non-existent record: operation=unlock; rid=.." - that is a bug in store_chk, see bz1060114. Let apply the fix https://bugzilla.redhat.com/attachment.cgi?id=858071&action=diff to /usr/lib64/python2.6/site-packages/qpidstore/janal.py to let pass store_chk.

Alternative reproducer on linearstore: after sending the DTX transaction without commit, prepare (& commit) further transactions and observe number of journal files:
a) qpid-txtest --queues=1 --total-messages=1 --dtx=1 --dtx-commit=no
b) while true; do ./qpid-txtest --queues=1 --total-messages=1000 --tx-count=10 --queue-base-name=MyTestTx; done
c) After a while (i.e. some time after DTX default timeout applies), check number of files in /var/lib/qpidd/qls/tpl/ directory. 

Current behaviour:
The very first journal file created to keep DTX record due to step a) will persist there forever. Number of journal files will grow, with no file to be deleted ever.

Expected behaviour:
There should be one or maximally two (if while-cycle is still running). The very first file there created to keep DTX record due to step a) will be already gone.


2) Wrong QMF statistics: That is low priority issue (esp. compared to the original one preventing any (D)TX work completely) irrelevant on the original, I agree with filing it as a separate bug (and I am happy to have a look on it from devel perspective).

Changing back to ON_QA.

(sorry for the missing info as above, when I filed the BZ I did not know linear/legacy store status in 3.0 and expected the bz1060114 to be fixed in parallel)

Comment 8 Leonid Zhaldybin 2014-03-26 12:02:00 UTC
Tested on RHEL6.5 (both i386 and x86_64) using both testing scenarios suggested by Pavel, the original one from comment 0, which uses legacy store and store_chk utility, which I copied from the stable 0.18 version, and the one for linear store from comment 7 point 1. This issue has been fixed. The new dtx-default-timeout option provides the capability of setting a time interval after which the broker removes unfinished transactions from the store.

Packages used for testing:

python-qpid-0.22-12.el6.noarch
python-qpid-qmf-0.22-28.el6.i686
qpid-cpp-client-0.22-36.el6.i686
qpid-cpp-client-devel-0.22-36.el6.i686
qpid-cpp-client-devel-docs-0.22-36.el6.noarch
qpid-cpp-server-0.22-36.el6.i686
qpid-cpp-server-devel-0.22-36.el6.i686
qpid-cpp-server-linearstore-0.22-36.el6.i686
qpid-cpp-server-store-0.22-36.el6.i686.rpm
qpid-cpp-server-xml-0.22-36.el6.i686
qpid-java-client-0.22-6.el6.noarch
qpid-java-common-0.22-6.el6.noarch
qpid-java-example-0.22-6.el6.noarch
qpid-jca-0.22-2.el6.noarch
qpid-jca-xarecovery-0.22-2.el6.noarch
qpid-proton-c-0.6-1.el6.i686
qpid-qmf-0.22-28.el6.i686
qpid-tools-0.22-9.el6.noarch

Comment 9 errata-xmlrpc 2014-09-24 15:10:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html