On jon04 with our 90 agents attached, for some reason we're seeing availability reports fail to get processed due to a tx being marked as rollback. It seems these log messages only have happened immediately after our hourly purge job (and not all the time, just a couple times it has happened so far). See below for sample logs (there are no stack traces anywhere in the logs - this is the only place where any kind of error is mentioned): 2008-03-12 21:00:02,625 INFO [org.rhq.enterprise.server.measurement.AvailabilityManagerBean] Purging availabilities that are older than Tue Mar 13 21:00:02 EDT 2007 2008-03-12 21:00:02,640 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Availability data purged [0] - completed in [15]ms 2008-03-12 21:00:02,640 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Performing database maintenance (ANALYZE) 2008-03-12 21:00:02,656 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Database maintenance (ANALYZE) completed in [16]ms 2008-03-12 21:00:02,656 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Data Purge Job FINISHED [2641]ms 2008-03-12 21:11:57,218 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36105]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state 2008-03-12 21:25:11,937 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Error processing availability report from [spawn_36100]: java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state ...and several more "tx is in aborted state" messages... After this happens, some of the agents are backfilled because the avail reports failed to get committed. But later, the next avail reports come in successfully and we go green again. So we do recover, but we need to find out why our tx's get marked for rollback otherwise, we erroneously backfill some agents.
I think I was too hasty on making these Blockers, they're not good but the side effect so far appear minimal, or we eventually recover.
I think (can't prove it or say for sure) that this was a side effect of RHQ-46. After increasing my Oracle's SESSION config setting to 250, I no longer got spurious oracle errors. Coincidentally, I am not seeing these "tx is in aborted state" messages either. I think we can close this or otherwise put it on the backburner until/if we see this again.
resolving for now - we can reopen if we see this again.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-81