Bug 750393

Summary: ServerManagerBean EJB3 timer fails to reschedule if transaction expires due to database connectivity issues
Product: [Other] RHQ Project Reporter: Larry O'Leary <loleary>
Component: High AvailabilityAssignee: RHQ Project Maintainer <rhq-maint>
Status: NEW --- QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0.0CC: hrupp
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
JON 2.4.0 using Oracle 10g database
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Larry O'Leary 2011-10-31 19:13:48 EDT
Description of problem:
If the JON server loses the ability to connect to its database for 20 minutes, the result can be a transaction rollback resulting in the ServerManagerBean EJB3 timer never being rescheduled even after the database returns to normal operation. The result is that any server who experienced the limited database connectivity may have its heartbeat thread/timer stopped resulting in a permanent down state.

In the case that this had been reported one of the RHQ servers in the HA configuration showed a operation mode of DOWN and a last update time server days in the past. 


Server Name: server1
Port: 7080  
Operation Mode: NORMAL  
Last Update Time: 10/27/11, 2:06:24 PM, EDT

Server Name: server2
Port: 7180  
Operation Mode: DOWN  
Last Update Time: 10/16/11, 9:10:20 AM, EDT  


The state of the ServerManagerBean on server2 was:

[target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean]
   handle: [id=1315500819592,target=[target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean],first=16-Oct-2011 10:30:13.874,periode=0]
      [id=1315500819592,target=[target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean],remaining=-1302751239,periode=0,started_in_tx]


The time remaining in the timer is negative meaning that it occurred in the past. Further more, the state of the timer is "started_in_tx" however, it does not appear that anything explicitly set the time for this timer.

The relevant server2 log messages:

2011-10-16 09:10:50,576 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection
2011-10-16 09:10:50,706 WARN  [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null
org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed.
)
...
2011-10-16 09:26:09,453 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection
2011-10-16 09:26:09,545 WARN  [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null
org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed.
)
2011-10-16 09:26:55,542 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection
2011-10-16 09:26:57,364 WARN  [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null
org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed.
)
2011-10-16 10:29:43,874 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: java.lang.IllegalStateException: [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] The transaction is not active!
2011-10-16 10:29:43,760 WARN  [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_40] - Abort called on already aborted atomic action a0a0250:8484:4e68f201:5b7f54a