Bug 1003191
| Summary: | Transaction timeout occurs when there is a storage node deployment error | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> |
| Component: | Core Server | Assignee: | Nobody <nobody> |
| Status: | ON_QA --- | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | hrupp |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 951619 | ||
I have pushed a fix to master. commit hash: 9b3c7ffa8ce There was a deadlock issue that could manifest itself in the performAddNodeMaintenanceIfNecessary and in the performRemoveNodeMaintenanceIfNecessary methods when an occurred. Both methods had nested transactions in which both the outer and inner transactions tried to update the same storage node entity. The transactions aren no longer nested. |
Description of problem: If deployment fails at the start of the add maintenance phase, a deadlock occurs that prevents an error message stored on the storage node entity; consequently, the cluster status will remain JOINING when it should change to DOWN. Here are the errors from a server log that shows the issue, 12:33:05,541 ERROR [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (Reconnection-0) Aborting storage node deployment due to unexpected error while performing add node maintenance.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:] at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:] at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:] at com.datastax.driver.core.Session.execute(Session.java:110) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:] at com.datastax.driver.core.Session.execute(Session.java:79) [cassandra-driver-core-1.0.2-rhq-1.2.4.jar:] at org.rhq.server.metrics.StorageSession.execute(StorageSession.java:36) [rhq-server-metrics-4.9.0-SNAPSHOT.jar:4.9.0-SNAPSHOT] at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateReplicationFactor(StorageNodeOperationsHandlerBean.java:850) [rhq-server.jar:4.9.0-SNAPSHOT] at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.updateSchemaIfNecessary(StorageNodeOperationsHandlerBean.java:836) [rhq-server.jar:4.9.0-SNAPSHOT] at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenance(StorageNodeOperationsHandlerBean.java:223) [rhq-server.jar:4.9.0-SNAPSHOT] at org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean.performAddNodeMaintenanceIfNecessary(StorageNodeOperationsHandlerBean.java:200) [rhq-server.jar:4.9.0-SNAPSHOT] ... 12:43:05,416 WARN [com.arjuna.ats.arjuna] (Reconnection-0) ARJUNA012077: Abort called on already aborted atomic action 0:ffff0a101777:-1b97486a:5220c752:7292 12:43:05,417 ERROR [org.jboss.as.ejb3.invocation] (Reconnection-0) JBAS014134: EJB Invocation failed on component StorageNodeOperationsHandlerBean for method public abstract void org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerLocal.performAddNodeMaintenanceIfNecessary(java.net.InetAddress): javax.ejb.EJBTransactionRolledbackException: Transaction rolled back Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: