1064170 – Recovery problem on crash during prepare phase with TM running with JTS

Bug 1064170 - Recovery problem on crash during prepare phase with TM running with JTS

Summary: Recovery problem on crash during prepare phase with TM running with JTS

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Transaction Manager
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	TBD EAP 6
Assignee:	tom.jenkinson
QA Contact:	Ondrej Chaloupka
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-02-12 07:40 UTC by Ondrej Chaloupka
Modified:	2017-10-10 00:27 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-05-07 15:10:29 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Failing recovery with JTS - server.log (919.05 KB, text/x-log) 2014-02-12 07:40 UTC, Ondrej Chaloupka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	JBTM-2124	0	Major	Open	Add orphan detection for JTS interposition mode	2018-09-05 09:00:33 UTC

Description Ondrej Chaloupka 2014-02-12 07:40:56 UTC

Created attachment 862130 [details]
Failing recovery with JTS - server.log

I'm running with XA resources. I send a message to a queue and enlist one more testing (dummy) XAResource.

The flow of the test is following
enlisting jms xaresource to transaction
sending msg to queue
enlisting xa test resource
preparing jms xaresource
preparing xa test resource
crash JVM at the end of the test XAResource.prepare()

This means that all of the resouces are locked in prepare state and they should be rollbacked.

During recovery the jms resource is processed correctly because TM found some info in its jts logs. But such info was not saved for test xa resource. For that orphan filters comes to work. But all tree of them votes for abstain which means no rollback.

com.arjuna.ats.internal.jta.recovery.arjunacore.JTATransactionLogXAResourceOrphanFilter voted ABSTAIN
com.arjuna.ats.internal.jta.recovery.arjunacore.JTANodeNameXAResourceOrphanFilter voted ABSTAIN
subordinate node name of < 131072, 29, 36, 0000000000-1-112700153108-69-10482-13127-60004749, 29292929292929292929282815629293082137-40-751111615623292929767829292929292929 > is null
com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateJTAXAResourceOrphanFilter voted ABSTAIN


I compared the run with jta and there is the NodeNameXAResourceOrphanFilter which votes for rollback and all goes fine.

I've just found that node name is checked here
https://github.com/jbosstm/narayana/blob/master/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/recovery/arjunacore/SubordinateJTAXAResourceOrphanFilter.java#L50
and for the format is used Arjuna.XID()
https://github.com/jbosstm/narayana/blob/master/ArjunaJTA/jta/classes/com/arjuna/ats/jta/xa/XATxConverter.java#L191

I'm working with EAP 6.2.0.GA.

Plus I'm adding the server log as attachment. The recovery of test xa resource goes at time '13:29:24,109'. 

Note: what I can say the XAER_NOTA is returned by second call of rollback on JMS (HornetQ) XA resource. By JTS specification there could be called rollback twice on the same xid for particular resource.

Comment 2 tom.jenkinson 2014-02-19 09:10:37 UTC

Hi Ondra,

Sorry for the delay in responding to you, I have now completed my research and can confirm that this is working as expected. We do not automate orphan detection in the JTS case.

There are two things to consider:
1. Is this fact documented? - if not we should raise a BZ
2. Do we want to add this? If so we need to get it in a PRD. Its not as straightforward as it seems as it is possible that we are using context propagation, rather than (the default) interposition and in that scenario there is no log written by the TM at that node.

Hope that clarifies,
Tom

Comment 3 Ondrej Chaloupka 2014-02-19 14:07:44 UTC

Hi Tom,

thanks for the clarification. Just this could be unexpected behavior for users. Maybe...

There is no documentation about this "limitation" (at least what I know). So I will raise a documentation BZ to have this info in the next EAP documentation release. Which should be probably docs for EAP 6.3.0.

I see that this would not be simple fix.
At the question whether we want it I'm not clear about it. I think that this behavior could cause some problems... When I would think about the database xa resource then after prepare is called the database started transaction and called prepare on it. Which means that when rollback is not called then the transaction could be hanging there for some time. But, as I understand, at that time there is created no locks on database content so further operations won't be influenced. Is that right? And do you think that there could be some other danger for the application workflow?

In general from point of view how recovery works I would expect that the rollback should be called. Then it would be fine to have this fixed. Or am I wrong?

Thanks
Ondra

Comment 4 tom.jenkinson 2014-02-19 14:27:40 UTC

Hi Ondra,

Locks are indeed potentially held. However, a DBA is able to use the tooling to manually rollback those resources.

Tom

Comment 5 tom.jenkinson 2014-03-13 07:43:20 UTC

Hi Ondra,

What would you like to do with this, I think it should be closed as the system is working as expected.

I am happy if you want to raise a PRD item to get this feature added to a future version of the software? Do you add them to the EAP6 Jira?

Tom

Comment 9 Ondrej Chaloupka 2014-03-13 10:18:14 UTC

The 6.3.0 flag was removed.
I've put the target release to some feature EAP 6 release as from product specification the orphan detection is not supported for JTS.

I've created a product jira feature request for this could be added in future:
https://issues.jboss.org/browse/JBTM-2124

Thanks

Comment 10 tom.jenkinson 2014-05-07 15:02:26 UTC

Hi Ondra,

I think we should close the BZ as its not a product bug and just use the Jira until it makes a PRD. Are you OK with that?

Thanks,
Tom

Comment 11 Ondrej Chaloupka 2014-05-07 15:10:29 UTC

Agree.

This is not a bug as the behavior is expected by JTS spec.
The feature request was created (https://issues.jboss.org/browse/JBTM-2124) that the behavior could be consistent with JTA in future.

Note You need to log in before you can comment on or make changes to this bug.