Bug 1091308

Summary:	[RFE] EAP6-17 CMR test scenario fail on commit, fail on rollback does not pass
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Ondrej Chaloupka <ochaloup>
Component:	Transaction Manager	Assignee:	Stefano Maestri <smaestri>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Ondrej Chaloupka <ochaloup>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.3.0	CC:	dosoudil, hhovsepy, jkudrnac, jpederse, kkhan, mmusgrov, myarboro, tom.jenkinson
Target Milestone:	ER8
Target Release:	EAP 6.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-06-28 15:31:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1108152, 1108251
Bug Blocks:

Description Ondrej Chaloupka 2014-04-25 10:45:45 UTC

Now I have here the following test case:
1) enlist CMR datasource
2) enlist XA test datasource
3) prepare CMR
4) prepare XA
5) commit on CMR datasource - fails with ResourceException (simulating unavailable database)
5) aborting the second phase
6) rollback on XA datasource - fails with XAException.XAER_RMERR (simulating trouble on rollback)
7) do recovery

In case that 2 XA resources (no CMR one) are used then the both of them ends in HEURISTIC state and they are not recovered.
In case of CMR and XA resource the whole system looks like the rollback would be done - there is no row in xid table and no information about transaction participant in the log. The XA resource is rollbacked during the recovery process.

The first strange thing is that the transaction record is moved to AA/Expired folder.
The object store looks like this after the test
── ShadowNoFileLockStore
   └── defaultStore
       ├── EISNAME
       │   ├── 0_ffff7f000001_6ec13611_5357d5fb_27
       │   └── 0_ffff7f000001_6ec13611_5357d5fb_35
       └── StateManager
           └── BasicAction
               └── TwoPhaseCoordinator
                   └── AtomicAction
                       └── Expired
                           └── 0_ffff7f000001_6ec13611_5357d5fb_43

The other strange thing is that the log contains the warning despite the fact that the connection to database is available.
17:11:35,262 WARN  [com.arjuna.ats.jta] (Periodic Recovery) Could not restore_state< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff7f000001:-628e9697:5357d803:43, node_name=1, branch_uid=0:ffff7f000001:-628e9697:5357d803:48, subordinatenodename=null, eis_name=java:jboss/xa-datasources/CrashRecoveryDS >: com.arjuna.ats.arjuna.exceptions.ObjectStoreException: java:jboss/xa-datasources/CrashRecoveryDS was not online

Comment 9 Jesper Pedersen 2014-05-07 12:54:38 UTC

This BZ, and associated BZs, for EAP6-17 are Beta blockers, if we should have any hope of getting feedback from the customer

Comment 10 Stefano Maestri 2014-05-15 09:14:15 UTC

PR
https://github.com/jbossas/jboss-eap/pull/1336

Comment 11 Ondrej Chaloupka 2014-06-05 14:38:00 UTC

Hi Tom, Hi Stefano,

I'm putting the bz as QA failed as the test is still not passing.

I've checked the JBTM-2161 jira and what I understand it was closed without merging changes to Narayana code base.
I've checked that the pull request 1336 from Stefano is part of the ER5 code base.

I do not understand whether the fix to the EAP code base should fix the problem of the test or whether it fixes different issue?

Do you think that this could be fixed for 6.3.0?

This testcase is one of the customer's test scenario - see attachment at
https://issues.jboss.org/browse/PRODMGT-49.

Thanks
Ondra

Comment 12 tom.jenkinson 2014-06-06 10:13:53 UTC

Hi Ondra,

JBTM-2161 shouldn't have been required, the fix Stefano made should have been enough. I will retest the issue with the latest EAP tag.

Tom

Comment 13 Jesper Pedersen 2014-06-06 11:36:31 UTC

This BZ must be a Blocker. Somebody with the correct permission needs to add it back, and make sure it doesn't reset (BZ sux)

Comment 14 tom.jenkinson 2014-06-06 19:38:46 UTC

Hi,

As discussed with Ondra. It turns out there were/are three issues:

1. The ordering issue that Stefano has fixed - thanks that would definitely have been a blocker
2. There is an issue in the test suite where it is checking for state that shouldn't exist - this is in hand by Ondra - thanks
3. There is a bug in Narayana whereby the management probe will fail if the call is made when recovery has not started. I wouldn't classify this as a blocker as the call fails rather than returning erroneous data that an admin might act upon.

Its up to you, I have a pull request (https://github.com/jbosstm/narayana/pull/667) on the _component_ for 3 above so should be able to merge it early next week and raise a PR on the eap repo shortly after or you can consider it not a blocker too, depends on what you would classify this API call as.

Basically if you call probe and there is an indoubt TX with a CMR associated the call may fail.

Tom

Comment 16 Jesper Pedersen 2014-06-09 12:04:14 UTC

IMHO, I think we should at least document it in the release notes as a known issue then - if we don't get the component upgrade in for CR1.

Comment 17 tom.jenkinson 2014-06-09 14:42:10 UTC

I have merged pull 667 into Narayana so can do a component upgrade if we want. Who makes the call on that kind of choice at this stage?

Comment 18 Kabir Khan 2014-06-12 08:57:57 UTC

Fixed by component upgrade https://bugzilla.redhat.com/show_bug.cgi?id=1108152

Comment 19 Ondrej Chaloupka 2014-06-24 12:42:31 UTC

Verified on EAP 6.3.0.ER8.