Bug 1080457 - [RFE] EAP6-17 Inconsistency for recovery being run after the CMR resource is already commited
Summary: [RFE] EAP6-17 Inconsistency for recovery being run after the CMR resource is ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Transaction Manager
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER2
: EAP 6.3.0
Assignee: Michael
QA Contact: Ondrej Chaloupka
Russell Dickenson
URL:
Whiteboard:
Depends On: 1085877
Blocks: eap63-beta-blockers
TreeView+ depends on / blocked
 
Reported: 2014-03-25 13:34 UTC by Ondrej Chaloupka
Modified: 2015-02-02 12:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previous versions of JBoss EAP 6 contained a bug in the implementation of the Commit Markable Resource (CMR) recovery module that could cause changes to be rolled back, rather than recovered. The expected behavior is for the CMR recovery module to move the record into a different part of the recovery store so it is ignored by other recovery modules, as only the CMR module is aware of the resource changes. If the connection to the database failed then the datasource did not get added to the collection `queriedResourceManagers` and the record did not get moved. As a result, a different recovery module would attempt to recover the transaction and the recovery would not occur as expected. In this release of the product the code has been modified to ensure that the datasource is added as required, even if the connection fails.
Clone Of:
Environment:
Last Closed: 2014-06-28 15:42:07 UTC
Type: Bug


Attachments (Terms of Use)
server.log (351.78 KB, text/x-log)
2014-03-25 13:34 UTC, Ondrej Chaloupka
no flags Details
ds.properties file for oracle (381 bytes, text/plain)
2014-03-25 13:38 UTC, Ondrej Chaloupka
no flags Details


Links
System ID Priority Status Summary Last Updated
JBoss Issue Tracker JBTM-2132 Major Closed CMR Resources do not process XA exceptions correctly 2014-09-08 12:39:21 UTC

Description Ondrej Chaloupka 2014-03-25 13:34:50 UTC
Created attachment 878461 [details]
server.log

It seems that recovery with CMR as part of the transaction could cause data inconsistency. 

The test looks:
1) enlist test xa resource
2) enlist cmr db resource
3) prepare test xa resource
4) prepare cmr db resource
5) commit cmr db resource 
6) crash app server
7) start server with recovery being stopped (byteman waiting on signal)
8) stop proxy
9) do recovery of test xa resource

The test XA resource is rollbacked instead of being commited as CMR resource already was.

Comment 1 Ondrej Chaloupka 2014-03-25 13:38:24 UTC
Created attachment 878462 [details]
ds.properties file for oracle

This could be reproduced by test case of crash recovery testsuite

git clone -b lrco git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
cd eap-tests-transactions/jbossts
export JBOSS_HOME=path/to/jboss-eap
mvn clean verify -Dtest=JPAProxyCMRCrashRecoveryTestCase#commitHaltRecoveryProxyHalted -Dno.cleanup.at.teardown -Djbossts.noJTS -Dds.properties=path/to/ds.properties

Jboss eap 6.3.0.DR5 could be downloaded from: 
http://download.devel.redhat.com/devel/candidates/JBEAP/JBEAP-6.3.0.DR5/jboss-eap-6.3.0.DR5.zip

Comment 2 Ondrej Chaloupka 2014-03-25 15:06:24 UTC
I've just found that the order of operations is slightly different. The difference is in order of prepare calls. CMR resource is the first resource which is prepared. The correct order looks this:

1) enlist test xa resource
2) enlist cmr db resource
3) prepare cmr db resource
4) prepare test xa resource
5) commit cmr db resource 
... (then see #c0)

Comment 3 Ondrej Chaloupka 2014-04-02 06:01:46 UTC
Just for better explanation:
The proxy is simple socket proxy java program which serves just to transfer data from input to output and gives us possibility to kill the connection - simulate connection failures. The term stop proxy means that connection to database is going to be down.

Comment 4 Michael 2014-04-02 11:41:27 UTC
This is a bug in the implementation. The CMR recovery module should move the record into a different part of the recovery store so that it is ignored by the other recovery modules. If the connection to the database fails then the datasource does not get added to the collection queriedResourceManagers so in "Stage 2" of the algorithm the record does not get moved.

The result of this bug is that one of the next recovery modules tries to recovery the transaction instead of the CMR recovery module which is the only one that knows how to determine what the CMR resource did.

The fix is to move the line where we update the queriedResourceManagers list into the finally block so the datasource gets added even if the connection fails.

Comment 5 Michael 2014-04-08 15:52:52 UTC
The problem was that our code for detecting "orphaned" transactions was processing the transaction log and ignoring the "commit marker" on the embedded record. The linked external bug tracker (JBTM-2132) fixes that oversight.

Comment 6 JBoss JIRA Server 2014-04-09 15:03:37 UTC
Tom Jenkinson <tom.jenkinson@redhat.com> updated the status of jira JBTM-2132 to Closed

Comment 7 Ondrej Chaloupka 2014-04-23 13:45:20 UTC
Verified for EAP 6.3.0.ER2.

Comment 8 Scott Mumford 2014-05-14 00:13:33 UTC
I've added a draft release note based on the information found in this ticket.
Michael, can you please review the draft and amend as required? I found parsing the data challenging.

Comment 9 Michael 2015-02-02 12:59:36 UTC
(In reply to Scott Mumford from comment #8)
> I've added a draft release note based on the information found in this
> ticket.
> Michael, can you please review the draft and amend as required? I found
> parsing the data challenging.

Hi Scott I am assuming this is historical and you no longer need a response.


Note You need to log in before you can comment on or make changes to this bug.