Hide Forgot
Description of problem: If the JON server loses the ability to connect to its database for 20 minutes, the result can be a transaction rollback resulting in the ServerManagerBean EJB3 timer never being rescheduled even after the database returns to normal operation. The result is that any server who experienced the limited database connectivity may have its heartbeat thread/timer stopped resulting in a permanent down state. In the case that this had been reported one of the RHQ servers in the HA configuration showed a operation mode of DOWN and a last update time server days in the past. Server Name: server1 Port: 7080 Operation Mode: NORMAL Last Update Time: 10/27/11, 2:06:24 PM, EDT Server Name: server2 Port: 7180 Operation Mode: DOWN Last Update Time: 10/16/11, 9:10:20 AM, EDT The state of the ServerManagerBean on server2 was: [target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean] handle: [id=1315500819592,target=[target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean],first=16-Oct-2011 10:30:13.874,periode=0] [id=1315500819592,target=[target=jboss.j2ee:service=EJB3,ear=rhq.ear,jar=rhq-enterprise-server-ejb3.jar,name=ServerManagerBean],remaining=-1302751239,periode=0,started_in_tx] The time remaining in the timer is negative meaning that it occurred in the past. Further more, the state of the timer is "started_in_tx" however, it does not appear that anything explicitly set the time for this timer. The relevant server2 log messages: 2011-10-16 09:10:50,576 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection 2011-10-16 09:10:50,706 WARN [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed. ) ... 2011-10-16 09:26:09,453 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection 2011-10-16 09:26:09,545 WARN [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed. ) 2011-10-16 09:26:55,542 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: javax.ejb.EJBTransactionRolledbackException: org.hibernate.exception.GenericJDBCException: Cannot open connection 2011-10-16 09:26:57,364 WARN [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLException: ORA-00257: archiver error. Connect internal only, until freed. ) 2011-10-16 10:29:43,874 ERROR [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] Failed to handle cloud heartbeat timer - will try again later. Cause: java.lang.IllegalStateException: [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] The transaction is not active! 2011-10-16 10:29:43,760 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_40] - Abort called on already aborted atomic action a0a0250:8484:4e68f201:5b7f54a