Bug 536473 - (RHQ-81) avail report fails due to "Can't commit because the transaction is in aborted state"
avail report fails due to "Can't commit because the transaction is in aborted...
Status: CLOSED NEXTRELEASE
Product: RHQ Project
Classification: Other
Component: Performance (Show other bugs)
unspecified
All All
high Severity medium (vote)
: ---
: ---
Assigned To: John Mazzitelli
http://jira.rhq-project.org/browse/RH...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-13 09:08 EDT by John Mazzitelli
Modified: 2008-07-02 14:36 EDT (History)
0 users

See Also:
Fixed In Version: 1.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2008-03-13 09:08:00 EDT
On jon04 with our 90 agents attached, for some reason we're seeing availability reports fail to get processed due to a tx being marked as rollback.  It seems these log messages only have happened immediately after our hourly purge job (and not all the time, just a couple times it has happened so far).  See below for sample logs (there are no stack traces anywhere in the logs - this is the only place where any kind of error is mentioned):

2008-03-12 21:00:02,625 INFO  [org.rhq.enterprise.server.measurement.AvailabilityManagerBean] Purging availabilities that are older than Tue Mar 13 21:00:02 EDT 2007
2008-03-12 21:00:02,640 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Availability data purged [0] - completed in [15]ms
2008-03-12 21:00:02,640 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Performing database maintenance (ANALYZE)
2008-03-12 21:00:02,656 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Database maintenance (ANALYZE) completed in [16]ms
2008-03-12 21:00:02,656 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job FINISHED [2641]ms

2008-03-12 21:11:57,218 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36105]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state

2008-03-12 21:25:11,937 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36100]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state

...and several more "tx is in aborted state" messages...

After this happens, some of the agents are backfilled because the avail reports failed to get committed.  But later, the next avail reports come in successfully and we go green again. So we do recover, but we need to find out why our tx's get marked for rollback otherwise, we erroneously backfill some agents.
Comment 1 Charles Crouch 2008-03-17 17:55:47 EDT
I think I was too hasty on making these Blockers, they're not good but the side effect so far appear minimal, or we eventually recover.
Comment 2 John Mazzitelli 2008-03-20 16:13:08 EDT
I think (can't prove it or say for sure) that this was a side effect of RHQ-46.

After increasing my Oracle's SESSION config setting to 250, I no longer got spurious oracle errors.  Coincidentally, I am not seeing these "tx is in aborted state" messages either.

I think we can close this or otherwise put it on the backburner until/if we see this again.
Comment 3 John Mazzitelli 2008-03-20 16:16:25 EDT
resolving for now - we can reopen if we see this again.
Comment 4 Red Hat Bugzilla 2009-11-10 16:17:33 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-81

Note You need to log in before you can comment on or make changes to this bug.