Created attachment 862130 [details]
Failing recovery with JTS - server.log
I'm running with XA resources. I send a message to a queue and enlist one more testing (dummy) XAResource.
The flow of the test is following
enlisting jms xaresource to transaction
sending msg to queue
enlisting xa test resource
preparing jms xaresource
preparing xa test resource
crash JVM at the end of the test XAResource.prepare()
This means that all of the resouces are locked in prepare state and they should be rollbacked.
During recovery the jms resource is processed correctly because TM found some info in its jts logs. But such info was not saved for test xa resource. For that orphan filters comes to work. But all tree of them votes for abstain which means no rollback.
com.arjuna.ats.internal.jta.recovery.arjunacore.JTATransactionLogXAResourceOrphanFilter voted ABSTAIN
com.arjuna.ats.internal.jta.recovery.arjunacore.JTANodeNameXAResourceOrphanFilter voted ABSTAIN
subordinate node name of < 131072, 29, 36, 0000000000-1-112700153108-69-10482-13127-60004749, 29292929292929292929282815629293082137-40-751111615623292929767829292929292929 > is null
com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateJTAXAResourceOrphanFilter voted ABSTAIN
I compared the run with jta and there is the NodeNameXAResourceOrphanFilter which votes for rollback and all goes fine.
I've just found that node name is checked here
and for the format is used Arjuna.XID()
I'm working with EAP 6.2.0.GA.
Plus I'm adding the server log as attachment. The recovery of test xa resource goes at time '13:29:24,109'.
Note: what I can say the XAER_NOTA is returned by second call of rollback on JMS (HornetQ) XA resource. By JTS specification there could be called rollback twice on the same xid for particular resource.
Sorry for the delay in responding to you, I have now completed my research and can confirm that this is working as expected. We do not automate orphan detection in the JTS case.
There are two things to consider:
1. Is this fact documented? - if not we should raise a BZ
2. Do we want to add this? If so we need to get it in a PRD. Its not as straightforward as it seems as it is possible that we are using context propagation, rather than (the default) interposition and in that scenario there is no log written by the TM at that node.
Hope that clarifies,
thanks for the clarification. Just this could be unexpected behavior for users. Maybe...
There is no documentation about this "limitation" (at least what I know). So I will raise a documentation BZ to have this info in the next EAP documentation release. Which should be probably docs for EAP 6.3.0.
I see that this would not be simple fix.
At the question whether we want it I'm not clear about it. I think that this behavior could cause some problems... When I would think about the database xa resource then after prepare is called the database started transaction and called prepare on it. Which means that when rollback is not called then the transaction could be hanging there for some time. But, as I understand, at that time there is created no locks on database content so further operations won't be influenced. Is that right? And do you think that there could be some other danger for the application workflow?
In general from point of view how recovery works I would expect that the rollback should be called. Then it would be fine to have this fixed. Or am I wrong?
Locks are indeed potentially held. However, a DBA is able to use the tooling to manually rollback those resources.
What would you like to do with this, I think it should be closed as the system is working as expected.
I am happy if you want to raise a PRD item to get this feature added to a future version of the software? Do you add them to the EAP6 Jira?
The 6.3.0 flag was removed.
I've put the target release to some feature EAP 6 release as from product specification the orphan detection is not supported for JTS.
I've created a product jira feature request for this could be added in future:
I think we should close the BZ as its not a product bug and just use the Jira until it makes a PRD. Are you OK with that?
This is not a bug as the behavior is expected by JTS spec.
The feature request was created (https://issues.jboss.org/browse/JBTM-2124) that the behavior could be consistent with JTA in future.