Bug 1169671 - Recovery scenario where db connection is halted after prepare phase does not rollback resource
Summary: Recovery scenario where db connection is halted after prepare phase does not ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Transaction Manager
Version: 6.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: EAP 6.4.0
Assignee: Gytis Trikleris
QA Contact: Ondrej Chaloupka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-02 07:04 UTC by Ondrej Chaloupka
Modified: 2015-01-09 17:13 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-12-08 10:32:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
server.log for proxy prepareHalt scenario with JTS running on PostgreSQL (1.85 MB, text/plain)
2014-12-02 07:06 UTC, Ondrej Chaloupka
no flags Details

Description Ondrej Chaloupka 2014-12-02 07:04:37 UTC
We started to hit a problem with proxy prepare halt scenario with DR11 (Narayana 4.17.24.Final) release with JTS. This scenario halts connection to database after the XA resource is prepared (prepare phase passes succesfully) and commit throws an XAException that resource is unavailable.
The type of XAException code depends on database jdbc driver. For Oracle and DB2 it's XAER_RMFAIL (there was a bug about it bz#1080035). For PosgtgreSQL, MSSQL and others it returns XAER_RMERR. And scenario stopped to work when XAER_RMERR is returned. It was working correctly with DR10 and Narayana 4.17.23.Final.
Consistently hitting for PostgreSQL and MSSQL.

Test flow
1) prepare db XA
2) stop db connection
3) prepare test XA
4) commit db XA 
   - connection is down - XAException.XAER_RMERR thrown by jdbc driver
   - client gets javax.transaction.TransactionRolledbackException
5) transaction is aborted
6) rolback test XA
7) recovery: rollback db XA (this is correct and expected behaviour)

Currently TM does not run rollback but it tries to commit the resource. As the whole transaction is aborted the all tries to commit resource ends with 
"Recovery of resource failed when trying to call commit got exception: org.omg.CosTransactions.HeuristicRollback: IDL:omg.org/CosTransactions/HeuristicRollback:1.0"

Please, see details in attached server log.

Comment 1 Ondrej Chaloupka 2014-12-02 07:06:40 UTC
Created attachment 963569 [details]
server.log for proxy prepareHalt scenario with JTS running on PostgreSQL

Comment 3 tom.jenkinson 2014-12-02 10:21:17 UTC
Hi Gytis,

As you resolved https://bugzilla.redhat.com/show_bug.cgi?id=1080035, please can you take a look and check what Ondra is reporting?

Thanks,
Tom

Comment 4 Gytis Trikleris 2014-12-03 11:53:36 UTC
Before introducing AssumedCompleteHeursticTransaction type, after 3 recovery iterations, record used to be moved to AssumedCompleteTransaction store. Even though, the record in this test is of state HeuristicRollback, GenericRecoveryCoordinator would not find the record and tell resource manager to roll back.

After introducing AssumedCompleteHeursticTransaction type, record is moved to AssumedCompleteHeursticTransaction store. Status checker checks this store, therefore GenericRecoveryCoordinator still gets the status org.omg.CosTransactions.Status.StatusCommitted when replaying completion. Therefore, resource manager is never rolled back.

At the end of the test, assertion expects there to be 0 transactions in doubt, which used to work before. However, now there is always one transaction in doubt because of the reasons explained above.

Comment 5 Gytis Trikleris 2014-12-03 11:57:15 UTC
The test needs to be updated to comply with this change. If you're ok with this, I'll reassign this to you, Ondra.

Comment 6 Ondrej Chaloupka 2014-12-03 16:54:58 UTC
Hi Gytis,

thank you for explanation. I haven't realize the fact with heuristic state.

But still I have few doubts. Can you, please, clarify it to me?

Why this does have different behaviour from JTA implementation? The transaction is rollbacked after reconnection to db in jta. 
If I take a look on object store after the transaction is aborted it seems that it's clean (in fact this could not be true as I use jboss-cli and its command). If I understand it correctly when transaction is aborted then XAResourceRecord.forget for XAResourceRecord is called and the rollback is called by bottom-up phase of recovery.

For JTS it just put the transaction to another part of object store being as transaction in heuristic state.
Why is not ensured that whole transaction will be rollbacked when doAbort on transaction is called? Whole I mean with all participants.
Especially when client seems to get TransactionRolledbackException which suppose that transaction was really rollbacked.
If I admit that this is ok and user is expected to rollback the participant in heuristic state on his own then jboss-cli in EAP has to show it. But it's not happening as AssumedCompleteTransaction is not shown by default (I know about bz#1168973). Nevertheless the client has to get different exception type for user would be notified that has such responsibility.

I can't help to myself but behaviour in 4.17.24 seems not being correct.

Comment 7 Ondrej Chaloupka 2014-12-03 16:56:51 UTC
I've quickly look on server log and I've spotted exception [1]. Could this mean that forget for JTS is called as well but there is some error in code that causes that it's not effective? Or that exception is correct/harmless in fact?

[1]
WARN  [jacorb.poa.controller] (RequestController-1) rid: 118 opname: forget cannot process request, because object doesn't exist
WARN  [jacorb.poa.controller] (RequestController-1) rid: 118 opname: forget request rejected with exception: null
WARN  [com.arjuna.ats.jts] (RequestProcessor-20) ARJUNA022223: ExtendedResourceRecord.forgetHeuristic caught exception: org.omg.CORBA.OBJECT_NOT_EXIST: Server-side Exception: null
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [rt.jar:1.7.0_71]
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) [rt.jar:1.7.0_71]
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.7.0_71]
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526) [rt.jar:1.7.0_71]
 at org.jacorb.orb.SystemExceptionHelper.read(SystemExceptionHelper.java:223) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.jacorb.orb.ReplyReceiver.getReply(ReplyReceiver.java:319) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1090) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.jacorb.orb.Delegate.invoke(Delegate.java:957) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:80) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at com.arjuna.ArjunaOTS._ArjunaSubtranAwareResourceStub.forget(_ArjunaSubtranAwareResourceStub.java:342) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jts.resources.ExtendedResourceRecord.forgetHeuristic(ExtendedResourceRecord.java:647) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.doForget(BasicAction.java:3446) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.forgetHeuristics(BasicAction.java:1329) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jts.orbspecific.coordinator.ArjunaTransactionImple.forgetHeuristics(ArjunaTransactionImple.java:1365) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Abort(BasicAction.java:1936) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2712) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1820) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1504) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jts.orbspecific.coordinator.ArjunaTransactionImple.commit(ArjunaTransactionImple.java:375) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jts.ControlWrapper.commit(ControlWrapper.java:243) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jts.orbspecific.CurrentImple.commit(CurrentImple.java:247) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.jts.extensions.AtomicTransaction.commit(AtomicTransaction.java:276) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jta.transaction.jts.TransactionImple.commitAndDisassociate(TransactionImple.java:1296) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.internal.jta.transaction.jts.BaseTransaction.commit(BaseTransaction.java:130) [jbossjts-jacorb-4.17.23.Final.jar:4.17.23.Final]
 at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
 at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:92) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInOurTx(CMTTxInterceptor.java:278) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:341) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:240) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.remote.EJBRemoteTransactionPropagatingInterceptor.processInvocation(EJBRemoteTransactionPropagatingInterceptor.java:79) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.interceptors.EjbExceptionTransformingInterceptorFactories$1.processInvocation(EjbExceptionTransformingInterceptorFactories.java:75) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:55) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
 at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:185) [jboss-as-ee-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jboss.as.ejb3.iiop.EjbCorbaServant._invoke(EjbCorbaServant.java:322) [jboss-as-ejb3-7.5.0.Final-redhat-12.jar:7.5.0.Final-redhat-12]
 at org.jacorb.poa.RequestProcessor.invokeOperation(RequestProcessor.java:306) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.jacorb.poa.RequestProcessor.process(RequestProcessor.java:626) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]
 at org.jacorb.poa.RequestProcessor.run(RequestProcessor.java:769) [jacorb-2.3.2.redhat-6.jar:2.3.2.redhat-6]

17:52:42,083 WARN  [com.arjuna.ats.jts] (RequestProcessor-10) ARJUNA022261: ServerTopLevelAction detected that the transaction was inactive

Comment 8 Ondrej Chaloupka 2014-12-04 00:04:52 UTC
OK, I was thinking about my previous comment and that is probably not the cause (or directly related) as the forget is called against the database resource where connection is down. Because of that the exception.

But maybe still could be relevant to ask why forget on resource passes for JTA and not for JTS.

Comment 9 tom.jenkinson 2014-12-04 10:33:56 UTC
Hi Ondra,

I just want to confirm something with you about the eventual states of the XAResources

JTA:
test XAR: rollback
db XAR: rollback

JTS:
test XAR: rollback
db XAR: left in doubt

Is this correct? And was the only difference before that the JTS db XAR was rolledback too?

Thanks!
Tom

Comment 10 Ondrej Chaloupka 2014-12-04 10:46:18 UTC
Hi Tom,

yes, correct, that's the point of this bz. 

I would expect that both resources should be rollbacked and not left in indoubt state. Or user should be announced that there is some participant in heuristic state and he should be able to recover it by hand.
That's my idea about this.

Thank you
Ondra

Comment 11 tom.jenkinson 2014-12-04 13:54:10 UTC
Hi Ondra,

This is one of these inconsistencies between JTA and JTS that can require manual intervention. Its in a particularly complex area and so I am not surprised you query it - I had to check quite a lot of this myself!

Consider the following scenario:
prepare XAR1
prepare XAR2
commit XAR1 -> XAER_RMERR
rollback XAR2

As discussed on the linked BZ (or maybe not but its related to that change) we know to treat the XAER_RMERR as a heuristic and not clear it from the log  - except in the case where we have managed to make the outcome of the transaction the same as the heuristic (which is true in this case, they are both rolled back). In that scenario we are free to clear the heuristic outcome using xa_forget:
https://github.com/jbosstm/narayana/blob/master/ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/coordinator/BasicAction.java#L3539
    	/*
       	* If the user has selected to maintain all heuristic information, then
       	* we never explicitly tell resources to forget. We assume that the user
       	* (or some management tool) will do this, and simply save as much
       	* information as we can into the action state to allow them to do so.
       	*
       	* However, if we had a resource that returned a heuristic outcome and
       	* we managed to make the outcome of this transaction the same as that
       	* outcome, we removed the heuristic. So, we need to tell the resource
       	* regardless, or it'll never be able to tidy up.
       	*/

In both JTA and JTS this point of the code is reached and with JTA we see the xa_forget successfully called on the XAR. Where the difference starts is that with JTS we are seeing the call to xa_forget fail trying to contact the "remote" XAResourceRecord side. This is because the XAResourceRecord POA was already shut down during removeConnection() in com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord::commit().

You might say why not keep the remote side "open" still to receive this xa_forget call, unfortunately it is required to shutdown the POA during commit because the "remote" side cannot know whether or not it is the first resource (unlike in the in-memory JTA case) and so has to shutdown the POA when it gets the chance. It does mean that then the transaction is left in the log (as a heuristic) and will need manually clearing and that is the difference and where your test will need to be updated.

If you think the difference rather unexpected, you might also raise a docs issue to get this added to the documentation (if its not already).

My suggestions:
1. blocker-
2. Move to testsuite component and change assignee to yourself/Hayk
3. [Optional] Raise a documentation BZ (if its not covered already)

Thanks for the brain-teaser - it was a very interesting problem!
Tom

Comment 12 Ondrej Chaloupka 2014-12-05 08:15:29 UTC
Hi Tom,

thanks a lot for the detailed explanation. It all makes sense and now I finally understand the reason of the  "org.omg.CORBA.OBJECT_NOT_EXIST: Server-side Exception: null" exception.

From this point I completely agree that the test should be changed to count with the heuristic state of the resource in the log store.

But there are still two issues that I would like to discuss.

First is fact that client obtains TransactionRollbackException in this scenario [1]. I think it's in contrary to fact that the transaction is in heuristic state. I'm not sure if I do interpret this right but from what I understand from the java api [2] and the exception message is that me, as user, could be calm and transaction is rollbacked by itself. I do need to care any heuristic state.
I must admit that I'm not certain with this interpretation. This is based on my subjective logic and on Naryana behaviour in other cases. Narayana uses HeuristicMixedException [3] to be thrown to client when something in heuristic state should be expected in object store log. I have experienced that in similar test cases as this one is.
I understand the type of exception states for a kind of contract with the client side for it knowing what happened during the transaction processing.
Could be (should be) the type of exception changed for this?

Plus, sorry for that. I hope I won't be annoying but this scenario works fine on Narayana 4.17.23.Final (JBTM-2255 was not fixed there). I mean that for JTS the transaction was rollbacked in the same way as for JTA. I understand that the change of behaviour was needed because of the issue bz1080035 but I just wonder if these two issues (this bz1169671 and mentioned bz1080035) could not be fixed somelike coherently.

Second issue comes from the fact that AssumedCompleteHeursticTransaction is not shown in EAP jboss cli. This means that user is not able to recover such transaction. I understand that this issue is "a different one" and maybe will be fixed by completition of bz1168973. But maybe not as the bz1168973 talks about participants and I can't see either the transaction itself in jboss cli.
Just to complete my observation I'm adding the cli output before transaction is moved to AssumedComplete [4] and the content of store after that move [5].

Finally I have one more doubt. This could be taken as regression from user point of view. Behaviour of EAP 6.3.0 and EAP 6.4.0 will be different. As for this particular scenario the EAP 6.3.0 will rollback "correctly" and 6.4.0 will put transaction to heuristic state.
My personal feeling is that that this last doubt could be diminished if HeuristicMixedException will be thrown instead of TransactionRollbackException.

Thanks a again
Ondra



[1]
javax.transaction.TransactionRolledbackException: CORBA TRANSACTION_ROLLEDBACK 0 No; nested exception is: 
	org.omg.CORBA.TRANSACTION_ROLLEDBACK: ----------BEGIN server-side stack trace----------
javax.transaction.TransactionRolledbackException: Transaction rolled back
----------END server-side stack trace----------  vmcid: 0x0  minor code: 0  completed: No

[2] 
https://docs.oracle.com/javase/7/docs/api/javax/transaction/TransactionRolledbackException.html
[3]
https://docs.oracle.com/javaee/7/api/javax/transaction/HeuristicMixedException.html

[4]
/subsystem=transactions/log-store=log-store:read-resource(recursive=true)
{
    "outcome" => "success",
    "result" => {
        "expose-all-logs" => false,
        "type" => "default",
        "transactions" => {"0:ffff0a280468:-72fa2e73:548160fe:3e" => {
            "age-in-seconds" => "75",
            "id" => "0:ffff0a280468:-72fa2e73:548160fe:3e",
            "jmx-name" => undefined,
            "type" => "StateManager/BasicAction/TwoPhaseCoordinator/ArjunaTransactionImple",
            "participants" => {"1" => {
                "eis-product-name" => undefined,
                "eis-product-version" => undefined,
                "jmx-name" => undefined,
                "jndi-name" => "1",
                "status" => "HEURISTIC",
                "type" => "/StateManager/AbstractRecord/ExtendedResourceRecord"
            }}
        }}
    }
}


[5]
├── ShadowNoFileLockStore
│   └── defaultStore
│       ├── CosTransactions
│       │   └── XAResourceRecord
│       │       └── 0_ffff0a280468_-72fa2e73_548160fe_46
│       ├── Recovery
│       │   └── FactoryContact
│       │       ├── 0_ffff0a280468_-1fb48758_548160da_1d
│       │       └── 0_ffff0a280468_-72fa2e73_548160fe_14
│       ├── RecoveryCoordinator
│       │   └── 0_ffff52e38d0c_c91_4140398c_0
│       └── StateManager
│           └── BasicAction
│               └── TwoPhaseCoordinator
│                   └── ArjunaTransactionImple
│                       └── AssumedCompleteHeuristicTransaction
│                           └── 0_ffff0a280468_-72fa2e73_548160fe_3e

Comment 13 Ondrej Chaloupka 2014-12-05 08:18:55 UTC
Plus I think that I could add (just for illustration) the output of cli after the transaction is moved to AssumedCompleteHeuristicTransaction. I've mentioned that there is no information about such transaction but as addition this is the output.

/subsystem=transactions/log-store=log-store:read-resource(recursive=true)
{
    "outcome" => "success",
    "result" => {
        "expose-all-logs" => false,
        "type" => "default",
        "transactions" => undefined
    }
}

Comment 14 tom.jenkinson 2014-12-05 14:18:40 UTC
Thanks for the feedback Ondra.

I think there are broadly three points you raise:

1. Why is rollback exception OK?
It is OK because all the participants have rolled back - as this was the first resource we managed to get a single outcome from all participants even given the heuristic outcome of the first one (it was HeuristicRollback). So therefore you can see that the outcome is not mixed, both branches are rolled back.

2. As you say, we need to get AssumedCompleteHeursticTransaction in the other BZ

3. Entry in the object store that was not there in 6.4.
The entry in the object store is just a relic of the transaction, the branch is actually rolled backed in the resource manager when commit failed and the locks were released at that point. So although there is now a relic, it has not material effect on the users application. It was wrong to clean it up before and that got fixed. This links to your fourth point about a regression, which is not the case. The branches are rolled back in both 6.3 and 6.4, the thing that is different is there is still a relic in 6.4 (as there should be) and that is not a regression as the system is now operating correctly.

Basically the XA spec says that if this is called:
http://docs.oracle.com/javaee/1.3/api/javax/transaction/xa/XAResource.html#commit(javax.transaction.xa.Xid, boolean)
And it returns XAER_RMERR, the branch is rolled back (so although its in our tooling - it doesn't need manual intervention except to clear it from the log).

This is from the XA spec:
"An error occurred in committing the work performed on behalf of the transaction
branch and the branch’s work has been rolled back. Note that returning this error
signals a catastrophic event to a transaction manager since other resource
managers may successfully commit their work on behalf of this branch. This error
should be returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch’s resources in a prepared
state. Otherwise, [XA_RETRY] should be returned."

I think the reason you think there is a regression is you think that with 6.3 both branches are rolled back and with 6.4 you think one branch is rolled back and one is in heuristic state. To clarify that in 6.4 both branches are rolled back, but we still have a record of the one that heuristically rolled back during the commit (i.e. it rolled back when it returned XAER_RMERR).

I hope that helps clarify,
Tom

Comment 15 Ondrej Chaloupka 2014-12-05 15:23:57 UTC
Hi Tom,

yes, I understand your point. 
But I'm sorry that I haven't put it here clear enough but the issue is that the second branch is not rollback. Or not effectively from data consistency point of view (if I can call it so). There is called rollback against the branch but as the branch is 'offline' it just throws another XAER_RMERR.

What is happenings right now
1) start db XA
2) start test XA
3) prepare db XA
4) connection to database is put down
5) prepare test XA
6) a try to commit db XA
7) jdbc driver throws XAException.XAER_RMERR as connection is down
8) TM calls doAbort on the transaction
9) rollback test XA
10) a try to rollback db XA
11) jdbc driver throws a second XAException.XAER_RMERR as connection is down
12) TM returns TransactionRolledbackException to client
13) the log contains transaction with participant in HEURISTIC state
14) database contains not finished transaction (which possibly locks some resources)
15) recovery manager does not call rollback as after its third run (I expect 3rd but it's somelike that) moved the participant to AssumedCompleteHeuristicTransaction


result: client was informed about transaction should be rollbacked but transaction is still pending (in TM log under AssumedComplete directory and in database as well)

I understand that JTA behaves differently but I would like to mention here to point out that's recovery manager who solves the situation.
It's the same up to time of number 11)
12) forget is called on db XA object store record
13) TM returns TransactionRolledbackException to client
14) the object store log is clean
15) database contains not finished transaction
16) recovery manager does bottom-up recovery to call rollback on unfinished txn

From my point of view there is problem of inconsistency as:
client gets TransactionRolledbackException but database is in-doubt

Ondra

Comment 16 tom.jenkinson 2014-12-05 15:27:41 UTC
Ah, it looks like the DB has a bug then. It shouldn't be returned XAER_RMERR.

Definitely worth a release note and a report against the DB vendor.

From the XA spec (http://pubs.opengroup.org/onlinepubs/009680699/toc.pdf):
"An error occurred in committing the work performed on behalf of the transaction
branch and the branch’s work has been rolled back. Note that returning this error
signals a catastrophic event to a transaction manager since other resource
managers may successfully commit their work on behalf of this branch. This error
should be returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch’s resources in a prepared
state. Otherwise, [XA_RETRY] should be returned."

Thanks for persevering!

Comment 18 Ondrej Chaloupka 2014-12-08 08:43:29 UTC
Hi Tom,

I understand this point of view and the fact that it's not fully spec compliant but the change of behaviour influences most of the databases that we support.

For this scenarior XAER_RMERR is returned by PostgreSQL (Plus), MSSQL and Sybase. XAER_RMFAIL is returned by Oracle, XA_RETRY is returned by DB2, XAER_RMFAIL is returned by MySQL (for mysql there is a bug that mysql is not capable to handle such situation and it loses information about txn when client is disconnected).

To sum up the situation a bit more - this scenario does not work for Oracle and DB2 in 4.17.23.Final (fixed by bz#1080035 and works in 4.17.24.Final). This scenario stopped to work for PostgreSQL, MSSQL, Sybase in 4.17.24.Final.

I agree that from spec point of view it's not correct but should not be possible to make the behaviour a bit more closer to what JTA does? I mean e.g. left bottom-up recovery to work with records placed to AssumedCompleteHeursticTransaction (at least from my testing it seems that records place under AssumedCompleteHeursticTransaction are blocked to be grabbed by recovery process).

Thanks
Ondra

Comment 21 Ondrej Chaloupka 2014-12-08 10:32:57 UTC
Per discussion with Tom I'm closing this as a not a bug. Narayana is now spec compliant and we need to ask DB vendors to change behaviour of db drivers during 2PC on calling commit.

We agreed that a section in documentation will be created where we gather information on XA troubles for EAP 6.4.0.

Comment 23 Ondrej Chaloupka 2015-01-09 17:13:41 UTC
I've changed the title as the incorrect behaviour occurs for JTA as well (not only for JTS) when XAER_RMERR code is returned by jdbc driver.


Note You need to log in before you can comment on or make changes to this bug.