Bug 1091308

Summary: [RFE] EAP6-17 CMR test scenario fail on commit, fail on rollback does not pass
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Ondrej Chaloupka <ochaloup>
Component: Transaction ManagerAssignee: Stefano Maestri <smaestri>
Status: CLOSED CURRENTRELEASE QA Contact: Ondrej Chaloupka <ochaloup>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: dosoudil, hhovsepy, jkudrnac, jpederse, kkhan, mmusgrov, myarboro, tom.jenkinson
Target Milestone: ER8   
Target Release: EAP 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-28 15:31:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1108152, 1108251    
Bug Blocks:    

Description Ondrej Chaloupka 2014-04-25 10:45:45 UTC
Now I have here the following test case:
1) enlist CMR datasource
2) enlist XA test datasource
3) prepare CMR
4) prepare XA
5) commit on CMR datasource - fails with ResourceException (simulating unavailable database)
5) aborting the second phase
6) rollback on XA datasource - fails with XAException.XAER_RMERR (simulating trouble on rollback)
7) do recovery

In case that 2 XA resources (no CMR one) are used then the both of them ends in HEURISTIC state and they are not recovered.
In case of CMR and XA resource the whole system looks like the rollback would be done - there is no row in xid table and no information about transaction participant in the log. The XA resource is rollbacked during the recovery process.

The first strange thing is that the transaction record is moved to AA/Expired folder.
The object store looks like this after the test
── ShadowNoFileLockStore
   └── defaultStore
       ├── EISNAME
       │   ├── 0_ffff7f000001_6ec13611_5357d5fb_27
       │   └── 0_ffff7f000001_6ec13611_5357d5fb_35
       └── StateManager
           └── BasicAction
               └── TwoPhaseCoordinator
                   └── AtomicAction
                       └── Expired
                           └── 0_ffff7f000001_6ec13611_5357d5fb_43

The other strange thing is that the log contains the warning despite the fact that the connection to database is available.
17:11:35,262 WARN  [com.arjuna.ats.jta] (Periodic Recovery) Could not restore_state< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff7f000001:-628e9697:5357d803:43, node_name=1, branch_uid=0:ffff7f000001:-628e9697:5357d803:48, subordinatenodename=null, eis_name=java:jboss/xa-datasources/CrashRecoveryDS >: com.arjuna.ats.arjuna.exceptions.ObjectStoreException: java:jboss/xa-datasources/CrashRecoveryDS was not online

Comment 9 Jesper Pedersen 2014-05-07 12:54:38 UTC
This BZ, and associated BZs, for EAP6-17 are Beta blockers, if we should have any hope of getting feedback from the customer

Comment 10 Stefano Maestri 2014-05-15 09:14:15 UTC
PR
https://github.com/jbossas/jboss-eap/pull/1336

Comment 11 Ondrej Chaloupka 2014-06-05 14:38:00 UTC
Hi Tom, Hi Stefano,

I'm putting the bz as QA failed as the test is still not passing.

I've checked the JBTM-2161 jira and what I understand it was closed without merging changes to Narayana code base.
I've checked that the pull request 1336 from Stefano is part of the ER5 code base.

I do not understand whether the fix to the EAP code base should fix the problem of the test or whether it fixes different issue?

Do you think that this could be fixed for 6.3.0?

This testcase is one of the customer's test scenario - see attachment at
https://issues.jboss.org/browse/PRODMGT-49.

Thanks
Ondra

Comment 12 tom.jenkinson 2014-06-06 10:13:53 UTC
Hi Ondra,

JBTM-2161 shouldn't have been required, the fix Stefano made should have been enough. I will retest the issue with the latest EAP tag.

Tom

Comment 13 Jesper Pedersen 2014-06-06 11:36:31 UTC
This BZ must be a Blocker. Somebody with the correct permission needs to add it back, and make sure it doesn't reset (BZ sux)

Comment 14 tom.jenkinson 2014-06-06 19:38:46 UTC
Hi,

As discussed with Ondra. It turns out there were/are three issues:

1. The ordering issue that Stefano has fixed - thanks that would definitely have been a blocker
2. There is an issue in the test suite where it is checking for state that shouldn't exist - this is in hand by Ondra - thanks
3. There is a bug in Narayana whereby the management probe will fail if the call is made when recovery has not started. I wouldn't classify this as a blocker as the call fails rather than returning erroneous data that an admin might act upon.

Its up to you, I have a pull request (https://github.com/jbosstm/narayana/pull/667) on the _component_ for 3 above so should be able to merge it early next week and raise a PR on the eap repo shortly after or you can consider it not a blocker too, depends on what you would classify this API call as.

Basically if you call probe and there is an indoubt TX with a CMR associated the call may fail.

Tom

Comment 16 Jesper Pedersen 2014-06-09 12:04:14 UTC
IMHO, I think we should at least document it in the release notes as a known issue then - if we don't get the component upgrade in for CR1.

Comment 17 tom.jenkinson 2014-06-09 14:42:10 UTC
I have merged pull 667 into Narayana so can do a component upgrade if we want. Who makes the call on that kind of choice at this stage?

Comment 18 Kabir Khan 2014-06-12 08:57:57 UTC
Fixed by component upgrade https://bugzilla.redhat.com/show_bug.cgi?id=1108152

Comment 19 Ondrej Chaloupka 2014-06-24 12:42:31 UTC
Verified on EAP 6.3.0.ER8.