Created attachment 889781 [details] ServerLog Description of problem: When trying to execute crashrec scenario with 1 CMR and 2 XA respurces (working with postgresql), there are some errors in log which is attached. Version-Release number of selected component (if applicable): jboss-eap-6.3.0.ER2 It is covered in test case "JPAMultiXACMRCrashRecoveryTestCase#commitHaltFirstXA" of "lrco" branch of "git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git" repo. You can reproduce it by running "mvn clean verify -Dtest=JPAMultiXACMRCrashRecoveryTestCase#commitHaltFirstXA -Dmaven.test.failure.ignore=true -fae -Djbossts.noJTS" in directory "~/eap-tests-transactions/integration/jbossts". You need postgresql92 database with 3 crashrec users. And "postgresql92-jdbc-driver.jar" put in some directory and mentioned in "ds.properties" Please find some hints and examples to configure ds.properies file with postgresql in attached file. How reproducible: always Steps to Reproduce: 1) enlist CMR datasource 2) enlist TestXAResource 3) enlist 2 XA datasources 4) prepare all 5) crash on "commit" method exit of TestXAResource 6) reboot the server and call recovery 7) at this point only the CMR is committed, and the 2 XA resources are not committed, but in fact they should Actual results: only CMR is committed Expected results: all resources must be committed Additional info: the same when it is 1 CMR 1 TestXAResource and 1 XAResource
Created attachment 889783 [details] ConfigurationHints
Created attachment 891730 [details] NewServerLog Adding the full log file once again.
Created attachment 891754 [details] TraceServerLog Attached new log with TRACE level.
The error is coming from the Postgresql resource manager not the CMR implementation. When recovery attempts to commit the resource an XAER_RMERR error is thrown which the TM reports as a rollback. So the question that needs resolving is why does the RM return XAER_RMERR. The highlights from the log are: The recovery attempt is here: 11:24:52,644 DEBUG [com.arjuna.ats.arjuna] (Periodic Recovery) RecoverAtomicAction.replayPhase2 recovering 0:ffff7f000001:-3e3426a8:5363643d:4a ActionStatus is ActionStatus.COMMITTED Byteman triggers another crash: 11:24:52,647 INFO [stdout] (Periodic Recovery) org.jboss.byteman.contrib.dtest.Instrumentor_com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord_topLevelCommit_crashatEXIT execute() Server restarts: 11:24:54,919 INFO [org.jboss.as] (MSC service thread 1-6) JBAS015899: JBoss EAP 6.3.0.Beta1 (AS 7.4.0.Final-redhat-10) starting There is a byteman rule parsing error: 11:25:06,642 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016009: Caught:: org.jboss.byteman.rule.exception.ExecuteException: MethodExpression.interpret : exception invoking method remoteTrace file /tmp/jbosstscrashrectests9039010908941169760 line 23 Caused by: java.rmi.NotBoundException: org.jboss.as.test.jbossts.common.TestXAResourceRecovered There is another recovery attempt: 11:25:16,664 DEBUG [com.arjuna.ats.arjuna] (Periodic Recovery) RecoverAtomicAction.replayPhase2 recovering 0:ffff7f000001:-3e3426a8:5363643d:4a ActionStatus is ActionStatus.COMMITTED The Postgresql resource manager throws an XAER_RMERR error: 11:25:16,677 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff7f000001:-3e3426a8:5363643d:4a, node_name=1, branch_uid=0:ffff7f000001:-3e3426a8:5363643d:5e, subordinatenodename=null, eis_name=java:jboss/xa-datasources/CrashRecoveryDS2 > (XAResourceWrapperImpl@5d0a8ec4[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@60231594 pad=false overrideRmValue=null productName=PostgreSQL productVersion=9.1.7 jndiName=java:jboss/xa-datasources/CrashRecoveryDS1]) failed with exception $XAException.XAER_RMERR: org.postgresql.xa.PGXAException: Error committing prepared transaction This means it rolled back
Created attachment 892854 [details] MySql55Log
Created attachment 892856 [details] Oracle12cLog
Michael, thanks for your research, I run the same scenario for other databases and it fails for other databases as well, with different errors in logs. please find the server trace logs for the same scenario run on mysql55 and oracle12c, there is NPE there.
The oracle problem is: ORA-24774: cannot switch to specified transaction Cause: The transaction specified in the call refers to a transaction created by a different user. Action: Create transactions with the same authentication so that they can be switched. which looks like a configuration error. The NPE in the mysql log is triggered by your helper org.jboss.as.test.jbossts.common.TestXAResourceRecoveryHelper.preDestroy are you sure the test is configured correctly.
Created attachment 894002 [details] PGSQLServerLog Hi Michael, There was a problem with configuration and it is fixed now. For MySQL and Oracle there are separate bugs which are already logged. So now the actual problem is related to PostgreSQL. Please find the new attachment of trace log: PGSQLServerLog. You can see that exception there "org.postgresql.util.PSQLException: ERROR: prepared transaction belongs to another database". One more strange thing is that it tries to rollback XA resource, but should commit as crash happened in second phase. There are also exceptions with "TestXAResource" (test framework problem), but you can ignore it as it does not affect on real XA resources.
That postgressql error and the rollback attempt is coming from your test harness (JPACrashHelper#wipeOutTxsInDoubt()) not from the TM. How come you are attempting to rollback the transaction branch. [Note: I do see later on in the log a recovery attempt on the same branch (131077_AAAAAAAAAAAAAP//fwAAAQVYtNhTbLRmAAAASjk=_AAAAAAAAAAAAAP//fwAAAQVYtNhTbLRmAAAAWQAAAAIAAAAA) which fails because the branch does not exist].
Created attachment 904618 [details] Oracle12cTrace Hi Michael, Please find attached the log of running under Oracle12c. Data sources are configured in a way that only one XA datasource is recoverable. Seems like the cause is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1104227. /Hayk.
*** Bug 1075008 has been marked as a duplicate of this bug. ***
Hi Mike, Thanks for your investigations. The problem of this issue is JCA related, when it does not take the correct datasource for recovery. The problem is described more detailed in: https://bugzilla.redhat.com/show_bug.cgi?id=1104227 As this BZ is duplicate I am going to close it. Sorry for inconvenience. /Hayk. *** This bug has been marked as a duplicate of bug 1104227 ***