Bug 536473 (RHQ-81)

Summary: avail report fails due to "Can't commit because the transaction is in aborted state"
Product: [Other] RHQ Project Reporter: John Mazzitelli <mazz>
Component: PerformanceAssignee: John Mazzitelli <mazz>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: unspecified   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-81
Whiteboard:
Fixed In Version: 1.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Mazzitelli 2008-03-13 13:08:00 UTC
On jon04 with our 90 agents attached, for some reason we're seeing availability reports fail to get processed due to a tx being marked as rollback.  It seems these log messages only have happened immediately after our hourly purge job (and not all the time, just a couple times it has happened so far).  See below for sample logs (there are no stack traces anywhere in the logs - this is the only place where any kind of error is mentioned):

2008-03-12 21:00:02,625 INFO  [org.rhq.enterprise.server.measurement.AvailabilityManagerBean] Purging availabilities that are older than Tue Mar 13 21:00:02 EDT 2007
2008-03-12 21:00:02,640 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Availability data purged [0] - completed in [15]ms
2008-03-12 21:00:02,640 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Performing database maintenance (ANALYZE)
2008-03-12 21:00:02,656 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Database maintenance (ANALYZE) completed in [16]ms
2008-03-12 21:00:02,656 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job FINISHED [2641]ms

2008-03-12 21:11:57,218 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36105]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state

2008-03-12 21:25:11,937 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36100]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state

...and several more "tx is in aborted state" messages...

After this happens, some of the agents are backfilled because the avail reports failed to get committed.  But later, the next avail reports come in successfully and we go green again. So we do recover, but we need to find out why our tx's get marked for rollback otherwise, we erroneously backfill some agents.

Comment 1 Charles Crouch 2008-03-17 21:55:47 UTC
I think I was too hasty on making these Blockers, they're not good but the side effect so far appear minimal, or we eventually recover.

Comment 2 John Mazzitelli 2008-03-20 20:13:08 UTC
I think (can't prove it or say for sure) that this was a side effect of RHQ-46.

After increasing my Oracle's SESSION config setting to 250, I no longer got spurious oracle errors.  Coincidentally, I am not seeing these "tx is in aborted state" messages either.

I think we can close this or otherwise put it on the backburner until/if we see this again.

Comment 3 John Mazzitelli 2008-03-20 20:16:25 UTC
resolving for now - we can reopen if we see this again.

Comment 4 Red Hat Bugzilla 2009-11-10 21:17:33 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-81