1003191 – Transaction timeout occurs when there is a storage node deployment error

Bug 1003191 - Transaction timeout occurs when there is a storage node deployment error

Summary: Transaction timeout occurs when there is a storage node deployment error

Keywords:
Status:	ON_QA
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Core Server
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	951619
TreeView+	depends on / blocked

Reported:	2013-08-31 17:00 UTC by John Sanda
Modified:	2022-03-31 04:27 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description John Sanda 2013-08-31 17:00:03 UTC

Description of problem:
If deployment fails at the start of the add maintenance phase, a deadlock occurs that prevents an error message stored on the storage node entity; consequently, the cluster status will remain JOINING when it should change to DOWN. Here are the errors from a server log that shows the issue,

12:33:05,541 ERROR [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (Reconnection-0) Aborting storage node deployment due to unexpected error while performing add node maintenance.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
        at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:110) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:79) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at org.rhq.server.metrics.StorageSession.execute(StorageSession.java:36) [rhq-server-metrics-4.9.0-SNAPSHOT.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateReplicationFactor(StorageNodeOperationsHandlerBean.java:850) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateSchemaIfNecessary(StorageNodeOperationsHandlerBean.java:836) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenance(StorageNodeOperationsHandlerBean.java:223) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenanceIfNecessary(StorageNodeOperationsHandlerBean.java:200) [rhq-server.jar:4.9.0-SNAPSHOT]
...
12:43:05,416 WARN  [com.arjuna.ats.arjuna] (Reconnection-0) ARJUNA012077: Abort called on already aborted atomic action 0:ffff0a101777:-1b97486a:5220c752:7292
12:43:05,417 ERROR [org.jboss.as.ejb3.invocation] (Reconnection-0) JBAS014134: EJB Invocation failed on component StorageNodeOperationsHandlerBean for method public abstract void org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerLocal.performAddNodeMaintenanceIfNecessary(java.net.InetAddress): javax.ejb.EJBTransactionRolledbackException: Transaction rolled back


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Sanda 2013-08-31 17:02:35 UTC

I have pushed a fix to master.

commit hash: 9b3c7ffa8ce

There was a deadlock issue that could manifest itself in the
performAddNodeMaintenanceIfNecessary and in the
performRemoveNodeMaintenanceIfNecessary methods when an occurred. Both methods
had nested transactions in which both the outer and inner transactions tried to
update the same storage node entity. The transactions aren no longer nested.

Note You need to log in before you can comment on or make changes to this bug.