Bug 1080457

Summary: [RFE] EAP6-17 Inconsistency for recovery being run after the CMR resource is already commited
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Ondrej Chaloupka <ochaloup>
Component: Transaction ManagerAssignee: Michael <mmusgrov>
Status: CLOSED CURRENTRELEASE QA Contact: Ondrej Chaloupka <ochaloup>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: unspecified    
Version: 6.3.0CC: hhovsepy, kkhan, mmusgrov, smumford
Target Milestone: ER2   
Target Release: EAP 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previous versions of JBoss EAP 6 contained a bug in the implementation of the Commit Markable Resource (CMR) recovery module that could cause changes to be rolled back, rather than recovered. The expected behavior is for the CMR recovery module to move the record into a different part of the recovery store so it is ignored by other recovery modules, as only the CMR module is aware of the resource changes. If the connection to the database failed then the datasource did not get added to the collection `queriedResourceManagers` and the record did not get moved. As a result, a different recovery module would attempt to recover the transaction and the recovery would not occur as expected. In this release of the product the code has been modified to ensure that the datasource is added as required, even if the connection fails.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-28 15:42:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1085877    
Bug Blocks: 1051640    
Attachments:
Description Flags
server.log
none
ds.properties file for oracle none

Description Ondrej Chaloupka 2014-03-25 13:34:50 UTC
Created attachment 878461 [details]
server.log

It seems that recovery with CMR as part of the transaction could cause data inconsistency. 

The test looks:
1) enlist test xa resource
2) enlist cmr db resource
3) prepare test xa resource
4) prepare cmr db resource
5) commit cmr db resource 
6) crash app server
7) start server with recovery being stopped (byteman waiting on signal)
8) stop proxy
9) do recovery of test xa resource

The test XA resource is rollbacked instead of being commited as CMR resource already was.

Comment 1 Ondrej Chaloupka 2014-03-25 13:38:24 UTC
Created attachment 878462 [details]
ds.properties file for oracle

This could be reproduced by test case of crash recovery testsuite

git clone -b lrco git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
cd eap-tests-transactions/jbossts
export JBOSS_HOME=path/to/jboss-eap
mvn clean verify -Dtest=JPAProxyCMRCrashRecoveryTestCase#commitHaltRecoveryProxyHalted -Dno.cleanup.at.teardown -Djbossts.noJTS -Dds.properties=path/to/ds.properties

Jboss eap 6.3.0.DR5 could be downloaded from: 
http://download.devel.redhat.com/devel/candidates/JBEAP/JBEAP-6.3.0.DR5/jboss-eap-6.3.0.DR5.zip

Comment 2 Ondrej Chaloupka 2014-03-25 15:06:24 UTC
I've just found that the order of operations is slightly different. The difference is in order of prepare calls. CMR resource is the first resource which is prepared. The correct order looks this:

1) enlist test xa resource
2) enlist cmr db resource
3) prepare cmr db resource
4) prepare test xa resource
5) commit cmr db resource 
... (then see #c0)

Comment 3 Ondrej Chaloupka 2014-04-02 06:01:46 UTC
Just for better explanation:
The proxy is simple socket proxy java program which serves just to transfer data from input to output and gives us possibility to kill the connection - simulate connection failures. The term stop proxy means that connection to database is going to be down.

Comment 4 Michael 2014-04-02 11:41:27 UTC
This is a bug in the implementation. The CMR recovery module should move the record into a different part of the recovery store so that it is ignored by the other recovery modules. If the connection to the database fails then the datasource does not get added to the collection queriedResourceManagers so in "Stage 2" of the algorithm the record does not get moved.

The result of this bug is that one of the next recovery modules tries to recovery the transaction instead of the CMR recovery module which is the only one that knows how to determine what the CMR resource did.

The fix is to move the line where we update the queriedResourceManagers list into the finally block so the datasource gets added even if the connection fails.

Comment 5 Michael 2014-04-08 15:52:52 UTC
The problem was that our code for detecting "orphaned" transactions was processing the transaction log and ignoring the "commit marker" on the embedded record. The linked external bug tracker (JBTM-2132) fixes that oversight.

Comment 6 JBoss JIRA Server 2014-04-09 15:03:37 UTC
Tom Jenkinson <tom.jenkinson> updated the status of jira JBTM-2132 to Closed

Comment 7 Ondrej Chaloupka 2014-04-23 13:45:20 UTC
Verified for EAP 6.3.0.ER2.

Comment 8 Scott Mumford 2014-05-14 00:13:33 UTC
I've added a draft release note based on the information found in this ticket.
Michael, can you please review the draft and amend as required? I found parsing the data challenging.

Comment 9 Michael 2015-02-02 12:59:36 UTC
(In reply to Scott Mumford from comment #8)
> I've added a draft release note based on the information found in this
> ticket.
> Michael, can you please review the draft and amend as required? I found
> parsing the data challenging.

Hi Scott I am assuming this is historical and you no longer need a response.