Bug 1079027 - Storage Node (un)deployment can cause deadlock in rhq_storage_node table
Summary: Storage Node (un)deployment can cause deadlock in rhq_storage_node table
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server, Storage Node
Version: 4.10,4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHQ 4.11
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1084651
TreeView+ depends on / blocked
 
Reported: 2014-03-20 19:16 UTC by John Sanda
Modified: 2014-07-21 10:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1084651 (view as bug list)
Environment:
Last Closed: 2014-07-21 10:13:30 UTC
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2014-03-20 19:16:01 UTC
Description of problem:
The (un)deployment process involves a series of transactions some of which are nested. There have been reports from users of either the deployment or undeployment process stalling indefinitely, particularly with Oracle.

The first step in either process involves resetting all of the StorageNodes errorMessage and failedOperation fields. This is done as part of the transaction that initiates the (un)deployment process. It should be done in a separate transaction.

Several of the methods in StorageNodeOperationsHandlerBean involve updating the state of one or more StorageNodes and then scheduling a resource operation. Some of those updates are done in the same transaction in which the resource operations are scheduled. Those StorageNode updates should be done in a separate transaction, followed by another transaction for scheduling the resource operation.

Separating out the transactions will reduce the nesting and further reduce the chancing of locking.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jay Shaughnessy 2014-03-31 21:44:53 UTC
Tested on Postgres and Oracle:
1) create a dev-container with multiple servers:

  modules/enterprise/server/appserver> mvn -o -Pdev -Poracle -DskipTests -Drhq.storage.num-nodes=4 clean install

2) Full server install for server-1
3) StorageNode only for server-2,3,4
4) start everything
5) run discovery

   storage nodes should get discovered and auto imported

6) validate resource links and SN status in topology->storage nodes
7) test undeploy via GUI

==========================================================================


https://github.com/rhq-project/rhq/pull/15

commit b6e70d104756db633df7b4ff642c29e3fad81d2b
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 15:09:17 2014 -0400

 Transaction delimiters have been updatedin StorageNode beans to try and prevent
 db-locking issues.  This caused a problem for storage node deployment
 during mergeInventory, because it relied on an umbrella transaction providing
 a persisted (but not yet committed) Resource.

 Added a general mechanism for performing post-commit actions on newly
 merged resources and leveraged it to perform linking a StorageNode to a
 Resource.


commit 710c93131f44dc32ab7397dafdec23c086dc2758
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 15:52:48 2014 -0400

 Make sure StorageNode is committed before we try and deploy it.


commit b8523c3bc2bc09eb7a9ce09005ded4a4a9cd364a
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 16:10:56 2014 -0400

 More changes to LinkStorageNode to try and get the transactioning right.


commit 919604ca697644ecf9fc8109395ecac9d21d781c
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Mar 31 16:29:06 2014 -0400

 Fix issue with new transactioning that left some entity changes uncommitted.


commit a8ba17c7e52518a3086d1e94a099e281c799eeca
Merge: ca99ddf 919604c
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Mar 31 17:24:03 2014 -0400

 Merge branch 'jshaughn/storage'

 Conflicts:
   modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/cloud/StorageNodeManagerBean.java

Comment 2 Heiko W. Rupp 2014-07-21 10:13:30 UTC
Bulk closing of RHQ 4.11 issues, now that RHQ 4.12 is out.

If you find an issue with those, please open a new BZ, linking to the old one.


Note You need to log in before you can comment on or make changes to this bug.