Bug 988724 - Crash recovery against PostgreSQL getting exception on rollback attempt
Summary: Crash recovery against PostgreSQL getting exception on rollback attempt
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Transaction Manager
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: tom.jenkinson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-26 08:56 UTC by Ondrej Chaloupka
Modified: 2013-09-19 10:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
An issue has been identified in which the PostgreSQL JDBC driver returns an incorrect return code in specific recovery situations. This issue occurs when a transaction manager running Java Transaction Service (JTS) performs a recovery following a crash during a transaction prepare phase. This results in `XAException` messages being written to the server log. This issue is caused by a second call against a non-existent transaction `Xid`, however this behaviour is defined to be valid according to the JTS specification. This issue has been identified as an incorrect behaviour within the PostreSQL JDBC driver, and the driver issue and its status can be monitored at https://github.com/pgjdbc/pgjdbc/issues/78
Clone Of:
Environment:
Last Closed: 2013-09-19 10:29:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Ondrej Chaloupka 2013-07-26 08:56:38 UTC
There is a bit strange error after crashrecovery on prepare halt test on PostgreSQL database.
After the rollback attempt the PostgreSQL returns incorrect code XAException.XA_HEURHAZ and throws exception.
It happens currently on all versions of Postgresql dbs that we are testing and are installed in lab.

This happens just for transaction manager running on JTS.

Tom remarks following: "It turns out that stack trace is coming out of postgres so it may be an issue you need to discuss with them? I can tell you they are returning XAException.XA_HEURHAZ [1] when asked to rollback. When we ask them for Xids they are returning the xid, we then call rollback and it gets that exception and prints that stack trace."

[1]
WARN  [com.arjuna.ats.jtax] (RequestProcessor-9) ARJUNA024023: XAResourceRecord.rollback caused an XA error: XAException.XAER_RMERR from resource XAResourceWrapperImpl@5062a5[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@13ea022 pad=false overrideRmValue=null productName=PostgreSQL productVersion=9.0.8 jndiName=java:jboss/xa-datasources/VerificationDS] in transaction < 131072, 29, 36, 0000000000-1-1127001-34-4592-3281-1456-70004049, 292929292929292929292828156292930-5-16121-3110158522292929697829292929292929 >: org.postgresql.xa.PGXAException: Error rolling back prepared transaction
 at org.postgresql.xa.PGXAConnection.rollback(PGXAConnection.java:416)
 at org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.rollback(XAManagedConnection.java:342)
 at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.rollback(XAResourceWrapperImpl.java:174)
 at com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:350) [jbossjts-jacorb-4.17.4.Final-redhat-2.jar:4.17.4.Final-redhat-2]
 at com.arjuna.ArjunaOTS.OTSAbstractRecordPOA._invoke(OTSAbstractRecordPOA.java:230) [jbossjts-jacorb-4.17.4.Final-redhat-2.jar:4.17.4.Final-redhat-2]
 at org.jacorb.poa.RequestProcessor.invokeOperation(RequestProcessor.java:306) [jacorb-2.3.2-redhat-4.jar:2.3.2-redhat-4]
 at org.jacorb.poa.RequestProcessor.process(RequestProcessor.java:626) [jacorb-2.3.2-redhat-4.jar:2.3.2-redhat-4]
 at org.jacorb.poa.RequestProcessor.run(RequestProcessor.java:769) [jacorb-2.3.2-redhat-4.jar:2.3.2-redhat-4]
Caused by: org.postgresql.util.PSQLException: ERROR: prepared transaction with identifier "131072_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAAKDE=_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAALQAAAAAAAAAA" does not exist
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
 at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
 at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:331)
 at org.postgresql.xa.PGXAConnection.rollback(PGXAConnection.java:406)
 ... 7 more

Comment 1 Ondrej Chaloupka 2013-07-26 13:38:54 UTC
I assume that the error is on the side of the AS and not in the PostgreSQL. There are two rollback calls for the same TX. 

The stack trace of the first rollback call:
Thread [Periodic Recovery] (Suspended (breakpoint at line 350 in XAResourceRecord)) 
 XARecoveryResourceImple(XAResourceRecord).rollback() line: 350 
 XARecoveryResourceImple(XAResourceRecord).doRecovery(boolean) line: 1335 
 XARecoveryResourceImple(XAResourceRecord).recover() line: 1158 
 XARecoveryResourceImple.recover() line: 88 
 XARecoveryModule(XARecoveryModule).xaRecoverySecondPass(XAResource) line: 661 
 XARecoveryModule(XARecoveryModule).bottomUpRecovery() line: 419 
 XARecoveryModule(XARecoveryModule).periodicWorkSecondPass() line: 194 
 PeriodicRecovery.doWorkInternal() line: 789 
 PeriodicRecovery.run() line: 371 


The stack trace of the second rollback call which causes the error is printed to the log:
Daemon Thread [RequestProcessor-10] (Suspended (breakpoint at line 350 in XAResourceRecord)) 
 XARecoveryResourceImple(XAResourceRecord).rollback() line: 350 
 XARecoveryResourceImple(OTSAbstractRecordPOA)._invoke(String, InputStream, ResponseHandler) line: 230 
 RequestProcessor.invokeOperation() line: 306 
 RequestProcessor.process() line: 626 
 RequestProcessor.run() line: 769 

Small part the log from PostgreSQL.
LOG:  execute <unnamed>: SET extra_float_digits = 3
LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
LOG:  execute <unnamed>: SELECT 1
LOG:  execute <unnamed>: SET extra_float_digits = 3
LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
LOG:  execute <unnamed>: SELECT 1
LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts
LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts
LOG:  execute <unnamed>: ROLLBACK PREPARED '131072_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAAKDE=_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAALQAAAAAAAAAA'
LOG:  execute <unnamed>: SELECT 1
LOG:  execute <unnamed>: SELECT 1
LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts
LOG:  execute <unnamed>: SELECT gid FROM pg_prepared_xacts
LOG:  execute <unnamed>: ROLLBACK PREPARED '131072_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAAKDE=_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAALQAAAAAAAAAA'
ERROR:  prepared transaction with identifier "131072_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAAKDE=_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAALQAAAAAAAAAA" does not exist
STATEMENT:  ROLLBACK PREPARED '131072_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAAKDE=_AAAAAAAAAAAAAP//fwAAAdN2921R8nmwAAAALQAAAAAAAAAA'

Comment 2 tom.jenkinson 2013-07-29 12:28:32 UTC
Hi Ondra,

I looked at the code and it is expected that a resource can have rollback called on it twice with JTS due to the way the OTS specification works where it requires both bottom up and top down recovery to be triggered by the recovering resource. The initial value returned by replay_completion is used by the resource to try to recover itself, but the call triggers the transaction to call recovery on everything in the intentions list which will be at least the resource as well but could be others. This means that all resources must be able to have rollback called on them multiple times.

It will not cause data integrity problems so I would mark this as expected behavior.

Thanks,
Tom

Comment 3 Ondrej Chaloupka 2013-07-29 12:44:23 UTC
Hi Tom,

does it mean that PostgreSQL should be able to handle it? Or is it responsibility of JCA or resource adapter? Do you have an idea?

Thank you
Ondra

Comment 4 tom.jenkinson 2013-07-29 12:44:59 UTC
Hi Ondra,

It means PostGres should be able to handle it.

Thanks,
Tom

Comment 5 Ondrej Chaloupka 2013-08-08 09:29:42 UTC
It was confirmed that this is problem of the PostgreSQL jdbc driver (there was a discussion on pgsql-bugs mailing list starting 2013-07-29 under subject 'Incorrect response code after XA recovery').

The pull request was put to fix this:
https://github.com/pgjdbc/pgjdbc/pull/76

Comment 6 Jeremy Whiting 2013-08-08 11:27:55 UTC
Hi Ondra,
 An issue has also been created in the bug tracking system for pgjdbc using GitHub. It is issue #78.

https://github.com/pgjdbc/pgjdbc/issues/78

 I have sent updates to the pgjdbc and bugs mailing list to inform the community the background to the testcase provided and the pull request.

Regards,
Jeremy

Comment 7 Ondrej Chaloupka 2013-08-19 11:47:29 UTC
(In reply to Jeremy Whiting from comment #6)
> Hi Ondra,
>  An issue has also been created in the bug tracking system for pgjdbc using
> GitHub. It is issue #78.
> 
> https://github.com/pgjdbc/pgjdbc/issues/78
> 
>  I have sent updates to the pgjdbc and bugs mailing list to inform the
> community the background to the testcase provided and the pull request.
> 
> Regards,
> Jeremy

Hi Jeremy,

thank you for fulfilling info.

Ondra

Comment 8 Ondrej Chaloupka 2013-08-19 12:00:56 UTC
Hi Tom,

I would like ask you how to process with this bugzilla. I think that it could be set to CLOSED/NOT A BUG (or something similar) as it was not an issue of app server.

Information about this would be nice to have in release notes of EAP.
I've added some doc info. Would you be so kind to check my text?

Thank you
Ondra

Comment 9 tom.jenkinson 2013-08-19 12:12:33 UTC
Hi Ondra, in terms of your text, would it make sense to reference the specific issue in the pgsql issue tracking system? Also, can you confirm it actually was fixed as your text implies it was but I don't think Jeremey's pull request was merged yet?

Comment 10 Ondrej Chaloupka 2013-08-19 12:23:19 UTC
Hi Tom,

I think that it makes sense to refer to the issue. I've changed the doc text.

Ouch, my mistake. I haven't checked the status of the issue properly. 
But for the current state is important to inform users about reason of the exception messages in the log.
I'll check the driver behaviour when it will be a built with the fix available. I've changed the doc text in this way.

Ondra

Comment 11 tom.jenkinson 2013-08-19 14:12:19 UTC
Thanks Ondra, looks good to me

Comment 14 Jeremy Whiting 2013-08-21 12:11:27 UTC
(In reply to Ondrej Chaloupka from comment #10)
> Hi Tom,
> 
> I think that it makes sense to refer to the issue. I've changed the doc text.
> 
> Ouch, my mistake. I haven't checked the status of the issue properly. 
> But for the current state is important to inform users about reason of the
> exception messages in the log.
> I'll check the driver behaviour when it will be a built with the fix
> available. I've changed the doc text in this way.
> 
> Ondra
Hi Ondra,
 My patch to the driver was merged into Git master branch of the postgresql jdbc driver.
 Customers can either use a driver built from the latest master branch or wait for the next release. The current one is '9.2-1002' so anything after that.

Regards,
Jeremy

Comment 16 tom.jenkinson 2013-09-12 09:06:20 UTC
Hi,

To clarify, this should be a docs issue I think? There is no fix to the Transaction Manager required.

Maybe @Lucas can be the assignee?

Tom

Comment 17 Ondrej Chaloupka 2013-09-12 10:52:55 UTC
Or maybe this could be set to closed/not fixed.
As the doc test was correctly updated (http://documentation-devel.engineering.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/6.1/html/6.1.1_Release_Notes/ar01s07s03.html) and there is currently no other work from our side?

Comment 18 tom.jenkinson 2013-09-19 07:41:51 UTC
I wonder, who can close the BZ? Is it ondra as the reporter?

Comment 19 Ondrej Chaloupka 2013-09-19 10:29:08 UTC
I'm not sure with the bz workflow but I hope that I can do it. 
In case that this would not be right, please, set another correct status. 
Thanks.

Summary: the problem is not in the Narayana TM but on the side of the PostgreSQL resource adapter. The adapter was changed and the fix should be available in some future release of RAR. Check the comments above.


Note You need to log in before you can comment on or make changes to this bug.