Bug 1003191 - Transaction timeout occurs when there is a storage node deployment error
Summary: Transaction timeout occurs when there is a storage node deployment error
Keywords:
Status: ON_QA
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 951619
TreeView+ depends on / blocked
 
Reported: 2013-08-31 17:00 UTC by John Sanda
Modified: 2022-03-31 04:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2013-08-31 17:00:03 UTC
Description of problem:
If deployment fails at the start of the add maintenance phase, a deadlock occurs that prevents an error message stored on the storage node entity; consequently, the cluster status will remain JOINING when it should change to DOWN. Here are the errors from a server log that shows the issue,

12:33:05,541 ERROR [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (Reconnection-0) Aborting storage node deployment due to unexpected error while performing add node maintenance.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
        at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:110) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:79) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at org.rhq.server.metrics.StorageSession.execute(StorageSession.java:36) [rhq-server-metrics-4.9.0-SNAPSHOT.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateReplicationFactor(StorageNodeOperationsHandlerBean.java:850) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateSchemaIfNecessary(StorageNodeOperationsHandlerBean.java:836) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenance(StorageNodeOperationsHandlerBean.java:223) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenanceIfNecessary(StorageNodeOperationsHandlerBean.java:200) [rhq-server.jar:4.9.0-SNAPSHOT]
...
12:43:05,416 WARN  [com.arjuna.ats.arjuna] (Reconnection-0) ARJUNA012077: Abort called on already aborted atomic action 0:ffff0a101777:-1b97486a:5220c752:7292
12:43:05,417 ERROR [org.jboss.as.ejb3.invocation] (Reconnection-0) JBAS014134: EJB Invocation failed on component StorageNodeOperationsHandlerBean for method public abstract void org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerLocal.performAddNodeMaintenanceIfNecessary(java.net.InetAddress): javax.ejb.EJBTransactionRolledbackException: Transaction rolled back


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2013-08-31 17:02:35 UTC
I have pushed a fix to master.

commit hash: 9b3c7ffa8ce

There was a deadlock issue that could manifest itself in the
performAddNodeMaintenanceIfNecessary and in the
performRemoveNodeMaintenanceIfNecessary methods when an occurred. Both methods
had nested transactions in which both the outer and inner transactions tried to
update the same storage node entity. The transactions aren no longer nested.


Note You need to log in before you can comment on or make changes to this bug.