Bug 1084651 - Storage Node (un)deployment can cause deadlock in rhq_storage_node table
Summary: Storage Node (un)deployment can cause deadlock in rhq_storage_node table
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server, Storage Node
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: DR02
: JON 3.2.2
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On: 1079027
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-05 01:59 UTC by John Sanda
Modified: 2018-12-05 18:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1079027
Environment:
Last Closed: 2014-07-29 00:17:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1106505 0 unspecified CLOSED Cannot handshake version when deploying two nodes parallely 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 767333 0 None None None Never

Internal Links: 1106505

Description John Sanda 2014-04-05 01:59:00 UTC
+++ This bug was initially created as a clone of Bug #1079027 +++

Description of problem:
The (un)deployment process involves a series of transactions some of which are nested. There have been reports from users of either the deployment or undeployment process stalling indefinitely, particularly with Oracle.

The first step in either process involves resetting all of the StorageNodes errorMessage and failedOperation fields. This is done as part of the transaction that initiates the (un)deployment process. It should be done in a separate transaction.

Several of the methods in StorageNodeOperationsHandlerBean involve updating the state of one or more StorageNodes and then scheduling a resource operation. Some of those updates are done in the same transaction in which the resource operations are scheduled. Those StorageNode updates should be done in a separate transaction, followed by another transaction for scheduling the resource operation.

Separating out the transactions will reduce the nesting and further reduce the chancing of locking.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Jay Shaughnessy on 2014-03-31 17:44:53 EDT ---


Tested on Postgres and Oracle:
1) create a dev-container with multiple servers:

  modules/enterprise/server/appserver> mvn -o -Pdev -Poracle -DskipTests -Drhq.storage.num-nodes=4 clean install

2) Full server install for server-1
3) StorageNode only for server-2,3,4
4) start everything
5) run discovery

   storage nodes should get discovered and auto imported

6) validate resource links and SN status in topology->storage nodes
7) test undeploy via GUI

==========================================================================


https://github.com/rhq-project/rhq/pull/15

commit b6e70d104756db633df7b4ff642c29e3fad81d2b
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 15:09:17 2014 -0400

 Transaction delimiters have been updatedin StorageNode beans to try and prevent
 db-locking issues.  This caused a problem for storage node deployment
 during mergeInventory, because it relied on an umbrella transaction providing
 a persisted (but not yet committed) Resource.

 Added a general mechanism for performing post-commit actions on newly
 merged resources and leveraged it to perform linking a StorageNode to a
 Resource.


commit 710c93131f44dc32ab7397dafdec23c086dc2758
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 15:52:48 2014 -0400

 Make sure StorageNode is committed before we try and deploy it.


commit b8523c3bc2bc09eb7a9ce09005ded4a4a9cd364a
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 16:10:56 2014 -0400

 More changes to LinkStorageNode to try and get the transactioning right.


commit 919604ca697644ecf9fc8109395ecac9d21d781c
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Mar 31 16:29:06 2014 -0400

 Fix issue with new transactioning that left some entity changes uncommitted.


commit a8ba17c7e52518a3086d1e94a099e281c799eeca
Merge: ca99ddf 919604c
Author: Jay Shaughnessy <jshaughn>
Date:   Mon Mar 31 17:24:03 2014 -0400

 Merge branch 'jshaughn/storage'

 Conflicts:
   modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/cloud/StorageNodeManagerBean.java

Comment 3 John Sanda 2014-05-23 15:36:01 UTC
I am reassigning to Jay since he did the work on this. I started to cherry pick the first commit. There are some merge conflicts. I would feel more comfortable with Jay resolving those conflicts since he is more familiar with the changes.

Comment 4 Jay Shaughnessy 2014-05-27 20:33:35 UTC
Missing in the list above was:

master commit 4921949e2e203015cf09f699b5d7a1352daab3a0:
    Initial commit that attempts to remove umbrella transactions starting at
    many entry points, such that scheduling operations happens in their own
    trans context.  Needs review and testing...

---------------------------------------------------------------------------

So, backport of fixes includes the following release/jon3.2.x commits:

commit e2363d24122bcf734ccf00e2b55c12a9b17cee29
Author: Jay Shaughnessy <jshaughn>
Date:   Tue May 27 16:26:31 2014 -0400
    Fix issue with new transactioning that left some entity changes uncommitted.

    Cherry-Pick master 919604ca697644ecf9fc8109395ecac9d21d781c

commit 7a711090fbcc8effa86dc918e1e543568ad4d28f
Author: Jay Shaughnessy <jshaughn>
Date:   Tue May 27 16:25:05 2014 -0400
    More changes to LinkStorageNode to try and get the transactioning right.

    Cherry-Pick master b8523c3bc2bc09eb7a9ce09005ded4a4a9cd364a

commit 5c4dc2d09b20d1eb73dbc2191920759d660b7c16
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Mar 21 15:52:48 2014 -0400
    Make sure StorageNode is committed before we try and deploy it.

    Cherry-Pick master 710c93131f44dc32ab7397dafdec23c086dc2758

commit d413e07e6c129e7bafcfd468f0890eff50ee50ca
Author: Jay Shaughnessy <jshaughn>
Date:   Tue May 27 16:23:54 2014 -0400
    Transaction delimiters have been updatedin StorageNode beans to try and
    prevent db-locking issues.  This caused a problem for storage node
    deployment during mergeInventory, because it relied on an umbrella
    transaction providing a persisted (but not yet committed) Resource.

    Added a general mechanism for performing post-commit actions on newly
    merged resources and leveraged it to perform linking a StorageNode to a
    Resource.

    Conflicts:
        modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/discovery/DiscoveryBossBean.java

    Cherry-Pick master b6e70d104756db633df7b4ff642c29e3fad81d2b

commit a6597c38bbe77fda9df9dd9a831b91aeed459fd5
Author: Jay Shaughnessy <jshaughn>
Date:   Thu Mar 20 17:17:17 2014 -0400
    Initial commit that attempts to remove umbrella transactions starting at
    many entry points, such that scheduling operations happens in their own
    trans context.  Needs review and testing...

    Conflicts: modules/enterprise/server/jar/src/main/java/org/rhq/enterprise/server/cloud/StorageNodeManagerBean.java

    Cherry-Pick master 4921949e2e203015cf09f699b5d7a1352daab3a0

Comment 5 Simeon Pinder 2014-05-30 02:43:35 UTC
Moving to ON_QA as available for test in latest cumulative patch build(DR01):
http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.2.GA/5-29-2014/

Comment 6 Filip Brychta 2014-06-10 09:20:33 UTC
Verified on 
Version :	
3.2.0.GA Update 02
Build Number :	
055b880:0620403

I verified following on oracle db:
1- there was no deadlock during deployment(all storage nodes were discovered and became UP with Cluster status NORMAL)
2- parallel undeployment works with this exception bz 1107579
3- sequential deployment/undeployment works correctly with this exception bz 1104647

but this doesn't mean that parallel deployment works. See bz 1106505.

Comment 7 Larry O'Leary 2014-07-29 00:17:10 UTC
This has been verified and released in Red Hat JBoss Operations Network 3.2 Update 02 (3.2.2) available from the Red Hat Customer Portal[1].



[1]: https://access.redhat.com/jbossnetwork/restricted/softwareDetail.html?softwareId=31783


Note You need to log in before you can comment on or make changes to this bug.