Bug 1003191 - Transaction timeout occurs when there is a storage node deployment error
Transaction timeout occurs when there is a storage node deployment error
Status: ON_QA
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
4.9
Unspecified Unspecified
unspecified Severity high (vote)
: ---
: ---
Assigned To: John Sanda
Mike Foley
:
Depends On:
Blocks: 951619
  Show dependency treegraph
 
Reported: 2013-08-31 13:00 EDT by John Sanda
Modified: 2015-09-02 20:04 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Sanda 2013-08-31 13:00:03 EDT
Description of problem:
If deployment fails at the start of the add maintenance phase, a deadlock occurs that prevents an error message stored on the storage node entity; consequently, the cluster status will remain JOINING when it should change to DOWN. Here are the errors from a server log that shows the issue,

12:33:05,541 ERROR [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (Reconnection-0) Aborting storage node deployment due to unexpected error while performing add node maintenance.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
        at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:110) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at com.datastax.driver.core.Session.execute(Session.java:79) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:]
        at org.rhq.server.metrics.StorageSession.execute(StorageSession.java:36) [rhq-server-metrics-4.9.0-SNAPSHOT.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateReplicationFactor(StorageNodeOperationsHandlerBean.java:850) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateSchemaIfNecessary(StorageNodeOperationsHandlerBean.java:836) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenance(StorageNodeOperationsHandlerBean.java:223) [rhq-server.jar:4.9.0-SNAPSHOT]
        at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenanceIfNecessary(StorageNodeOperationsHandlerBean.java:200) [rhq-server.jar:4.9.0-SNAPSHOT]
...
12:43:05,416 WARN  [com.arjuna.ats.arjuna] (Reconnection-0) ARJUNA012077: Abort called on already aborted atomic action 0:ffff0a101777:-1b97486a:5220c752:7292
12:43:05,417 ERROR [org.jboss.as.ejb3.invocation] (Reconnection-0) JBAS014134: EJB Invocation failed on component StorageNodeOperationsHandlerBean for method public abstract void org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerLocal.performAddNodeMaintenanceIfNecessary(java.net.InetAddress): javax.ejb.EJBTransactionRolledbackException: Transaction rolled back


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 John Sanda 2013-08-31 13:02:35 EDT
I have pushed a fix to master.

commit hash: 9b3c7ffa8ce

There was a deadlock issue that could manifest itself in the
performAddNodeMaintenanceIfNecessary and in the
performRemoveNodeMaintenanceIfNecessary methods when an occurred. Both methods
had nested transactions in which both the outer and inner transactions tried to
update the same storage node entity. The transactions aren no longer nested.

Note You need to log in before you can comment on or make changes to this bug.