Red Hat Bugzilla – Bug 981338
Crash recovery running against Oracle11g leaves ugly error messages in server log
Last modified: 2013-12-15 08:31:56 EST
After crash recovery happens to be run against Oracle 11g (it does not matter whether 11gR1 or 11gR2) you can see following erros to be printed in server log.
[stderr] (Periodic Recovery) oracle.jdbc.xa.OracleXAException
[stderr] (Periodic Recovery) at oracle.jdbc.xa.OracleXAResource.checkError(OracleXAResource.java:1135)
[stderr] (Periodic Recovery) at oracle.jdbc.xa.client.OracleXAResource.rollback(OracleXAResource.java:941)
[stderr] (Periodic Recovery) at org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.rollback(XAManagedConnection.java:342)
[stderr] (Periodic Recovery) at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.rollback(XAResourceWrapperImpl.java:174)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.handleOrphan(XARecoveryModule.java:738)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:644)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:417)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:194)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789)
[stderr] (Periodic Recovery) at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371)
[stderr] (Periodic Recovery) Caused by: java.sql.SQLSyntaxErrorException: ORA-01031: insufficient privileges
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:389)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:382)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIfun.processError(T4CTTIfun.java:600)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CTTIOtxen.doOTXEN(T4CTTIOtxen.java:168)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CXAResource.doTransaction(T4CXAResource.java:746)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CXAResource.kputxrec(T4CXAResource.java:809)
[stderr] (Periodic Recovery) at oracle.jdbc.driver.T4CXAResource.doRollback(T4CXAResource.java:645)
[stderr] (Periodic Recovery) at oracle.jdbc.xa.client.OracleXAResource.rollback(OracleXAResource.java:936)
[stderr] (Periodic Recovery) ... 8 more
As I've checked the settings of DB with https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Application_Platform/6.1/html-single/Administration_and_Configuration_Guide/index.html#Configure_XA_Recovery_Modules1
and the permissions are appropriate for XA usage I'm creating this bz.
There is some info on this in at forum here as well:
How to reproduce:
1) Checkout crash recovery tests
git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
2) cd eap-tests-transactions/integration/jbossts
3) export JBOSS_HOME=.../jboss-eap-6.1.0.GA
4) Run the test
mvn clean verify -Djboss.dist=$JBOSS_HOME -Dtimeout.factor=400 -Dtest=JPACrashRecoveryTestCase#prepareHalt -Dds=oracle11gR2 -Djbossts.noJTS
5) Check server log for errors
I do not think this is a problem with Narayana, I think it is an environmental issue.
To attempt to replicate the issue I have installed:
Oracle Linux 6
Oracle 11g Enterprise
I then did the following:
CREATE USER crashrec IDENTIFIED BY crashrec
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON users;
GRANT connect TO crashrec;
GRANT create table, create procedure TO crashrec;
I then checked I could log in as that user:
I then ran the test program you have submitted. This failed in XAResource::recover(), _not_ rollback as you are seeing. It had an XAException without a root cause which I have attached to this bug report.
I then ran the following commands on my server:
GRANT SELECT ON sys.dba_pending_transactions TO crashrec;
GRANT SELECT ON sys.pending_trans$ TO crashrec;
GRANT SELECT ON sys.dba_2pc_pending TO crashrec;
GRANT EXECUTE ON sys.dbms_xa TO crashrec;
I re-ran the test and it worked just fine.
I will reassign this back to you as clearly there is an issue but it appears to be with the resource manager. Maybe someone from engops can help you better as Oracle configuration is not my area of expertise. I tried to grep the internet for that ORA-01031 for you and there are some hits, I don't know if it helps: https://forums.oracle.com/message/10081726#10081726
Created attachment 768891 [details]
This shows an expected exception when XA is not configured correctly in terms of SQL grants etc
Ondra, the only other thing I can think of is I am using master which includes the JBTM-1810 fix, I wonder if that is the problem? Can you try using 4.17 branch?
I can confirm that using the branch with the JBTM-1810 fix works for me against db04.mw.lab.eng.bos.redhat.com
unfortunately I'm still getting error messages in log for db04.mw.lab.eng.bos.redhat.com. I'm struggling with it here, trying several combinations of branches/tags and cherry-picking of your fix and I'm not able to get the non-error result.
What I'm doing:
I cloned your branch - https://github.com/tomjenkinson/narayana/tree/JBTM-1810, compiled narayana, patched narayana-jts-integration and narayana-jts-jacorb, get crash recovery testsuite in the master branch and there are still the error messages occuring in the log (integration/jbossts/target/jbossas-jbossts/standalone/log/server.log).
Now I did change to crash recovery testsuite to check server.log for strings like error (https://issues.jboss.org/browse/JBQA-8154). Now the test should also fail in case of occurrence of text like that.
Your steps look the same as mine and it appears that now when I run it I get the same failure.
Stepping through it in the debugger I can see that it isn't 1810 (at least not any more should that have been an issue).
Oracle is returning 120 Xids at this point in time for rollback. This database is effectively not clean. I don't know why those Xids cannot be rolled back, possibly there is a data issue? I can see that the first Xid (i.e. when there were 121 Xids returned from XAResource::recover()) _can_ be rolled back. Subsequent ones cannot.
thanks for checking this.
I'll take a look at it at the start of the next week. It could be some data issue but it's strange that the test runs in the same way on all other databases and it works fine.
I would rather say that there will be some problem with DB settings when test on your local machine with db on virtualbox works fine.
I have replicated this issue on my own VM only when creating the transactions by a different user than the one performing recovery and not setting the appropriate "FORCE ANY TRANSACTION" permissions as per: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_xa.htm#BABCIHJD
"If a user needs to rollback a transaction branch that created by other users, the privilege FORCE ANY TRANSACTION must be granted to the user."
Is your test using a different user to create the transaction than the one that you have configured for recovery? If it is and you expect that to be the case then you must consider applying that setting.
However, if it is not the case that you have different users in play, I suspect the 120 Xid that are returned from the database to be an indicator as to the problem. Is it possible that another test is ran against the same database that had the same Narayana node identifier but used a different Oracle user to create the transaction? In that case you would have the same problem as we would be trying to rollback those Xid also. You will then need to clean the database before you start and I recommend you must then allocate different node identifiers to the different tests to ensure you do not get this issue in future.
Hope this explains the problem,
thanks a lot for explanation. I've found out that the problem was in the DB and data inconsistency. After cleaning the recovery works nice.
I'm putting this as not a bug.