Bug 713744

Summary: ResourceMetadataManagerBean Hangs on Oracle 11g
Product: [Other] RHQ Project Reporter: Stefan Negrea <snegrea>
Component: Core ServerAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.1CC: hrupp, mazz
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-07 19:28:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 625146    

Description Stefan Negrea 2011-06-16 13:08:51 UTC
ResourceMetadataManagerBean hangs on line 401 (operationMetadataMgr.updateMetadata(...) in method mergeExistingType when connected to Oracle 11g.

How reproducible:
The problem can be reproduced easily by running enterprise/server/jar tests.

Steps to Reproduce:
1. Go to modules/enterprise/server/jar module.
2. Run ResourceMetadataManagerBeanTest tests using an Oracle database test connection.
3. Observe the behaviour of upgradePlugin teste method.
  
Actual results:
Integrations tests hang indefinitely.

Expected results:
Integration tests pass.

Additional info:
Tested the code locally and the code hangs on a socket read (connected to the oracle 11g server) used by jdbc driver. The CPU activity on the Oracle 11g server is negligible, which means the server is under heavy load processing the request just sent.

Comment 1 John Mazzitelli 2011-06-30 21:00:11 UTC
this also happens on postgres when I tweeked the dbunit .xml files.

hangs here:

org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBeanTest.upgradePlugin()

Comment 2 John Mazzitelli 2011-06-30 21:21:47 UTC
If I uncomment this in plugin_v1.xml:

<!-- THIS CAUSES A DEADLOCK DURING PLUGIN REGISTRATION!
        <bundle-target>
           <destination-base-dir name="bundleTarget1">
               <value-context>pluginConfiguration</value-context>
               <value-name>connectionPropertyY</value-name>
           </destination-base-dir>
        </bundle-target>
-->

ResourceMetadataManagerBeanTest hangs during plugin update.

Comment 3 John Mazzitelli 2011-06-30 21:22:15 UTC
raising to high priority - this may be a problem with plugin updates for plugins that define bundle-target elements.

Comment 4 John Sanda 2011-07-01 16:11:00 UTC
There were a few XXXMetadatManagerBean methods that were missing the REQUIRES_NEW transaction attribute and were not performing their updates in their own, separate transactions. I was able to reproduce the deadlock described in comment 2, and putting in REQUIRES_NEW so that those method execute in their own transactions resolved the issue.

commit hash: 0ef2b576b995d39ecc31d6b84d52c9d57d9fa498

Comment 5 John Sanda 2011-07-01 16:54:22 UTC
Looks like the code changes in 0ef2b576b995d39ecc31d6b84d52c9d57d9fa498 did not resolve the issue on oracle. Moving back to ON_DEV for further investigation.

Comment 6 John Sanda 2011-07-01 18:13:12 UTC
My previous comment was based on the tests hanging in the oracle hudson job. I proceeded to test locally against oracle 10g. Without the fix, I reproduce the deadlock. Running with the fix, there is no deadlock and tests pass. It could be that the oracle instance used by hudson is not in a good state. Going to move this back to ON_QA as I have tested this against both oracle and postgresql.

Comment 7 John Sanda 2011-07-08 12:57:42 UTC
Found another place through a test where ResourceMetadataManagerBean is deadlocking. The method, testAddDeleteTemplate, in the class UpdateConfigurationSubsystemTest triggers the deadlock which is happening in ResourceMetadataManagerBean.mergeExistingType at the following line (399ish):

resourceConfigMetadataMgr.updateResourceConfigurationDefinition(existingType, resourceType)

I do not think that the above method call is causing the deadlock. I think it has to do with transaction boundaries established prior to mergeExistingType being called. Moving back to ON_DEV for further investigation.

Comment 8 John Sanda 2011-07-09 01:56:23 UTC
This deadlock (described in comment 7) was discovered and reproduced by running the test UpdateConfigurationSubsystemTest.testAddDeleteTemplate. The deadlock
occurs in ResourceMetadataManagerBean.mergeExistingType on line 397 which is,

  resourceConfigMetadataMgr.updateResourceConfigurationDefinition(existingType,
      resourceType);

The problem as with all of these deadlock issues involves the table rhq_config. In the test method, a plugin that declares a config template, is installed and then updated. In an initial transaction that is started in PluginManagerBean, a write lock is allocated to update rhq_plugin. updateResourceConfigurationDefinition executes in its own transaction, and because the test involves some config templates, a write lock is needed for rhq_config since there is a one-to-one assocation between rhq_config_template and rhq_config. That lock cannot be obtained because rhq_plugin has a FK with rhq_config for the ServerPlugin class, hence the deadlock.

With this commit, PluginManagerBean.registerPlugin has been refactored to avoid the deadlock. It first calls a new method, installPluginJar, which executes in its own transaction and handles updating the Plugin object. After that method is called, registerPluginTypes is called. That call starts a new transaction, and registerPluginTypes now only handles the meta data updates.

commit hash: 3cc1ede3a509763de99633876bf90891204890ee

I will wait to move this back to ON_QA until we get a passing hudson build on the oracle job.

Comment 9 Mike Foley 2011-07-25 20:29:37 UTC
documenting passing tests on Oracle

https://hudson.qa.jboss.com/hudson/view/RHQ%20Core/job/rhq-master/220/

additionally, i use oracle as my rhq repo on a daily basis ... including running the ui automation.

Comment 10 Mike Foley 2012-02-07 19:28:33 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE