Bug 1033858 - ResourceManagerBean.enableResources blocks indefinitely while holding database transaction open
Summary: ResourceManagerBean.enableResources blocks indefinitely while holding databas...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: GA
: RHQ 4.10
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-23 20:26 UTC by Elias Ross
Modified: 2014-04-23 12:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-23 12:32:43 UTC


Attachments (Terms of Use)

Description Elias Ross 2013-11-23 20:26:39 UTC
Description of problem:

While a database transaction is open, RHQ can hold the transaction a long time waiting for the agent to reply to the requestFullAvailabilityReport request. This can be up to 5 minutes. This can cause database locking issues. Stack trace:

"http-/0.0.0.0:7080-53" daemon prio=10 tid=0x00007f3314064000 nid=0x6d22 waiting on condition [0x00007f32627e2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000042c0e69e8> (a java.util.concurrent.FutureTask$Sync)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1011)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:227)
        at java.util.concurrent.FutureTask.get(FutureTask.java:91)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.run(ClientCommandSenderTask.java:143)
        at org.rhq.enterprise.communications.command.client.ClientCommandSender.sendSynch(ClientCommandSender.java:647)
        at org.rhq.enterprise.communications.command.client.ClientRemotePojoFactory$RemotePojoProxyHandler.invoke(ClientRemotePojoFactory.java:418)
        at $Proxy1587.requestFullAvailabilityReport(Unknown Source)
        at org.rhq.enterprise.server.resource.ResourceManagerBean.enableResources(ResourceManagerBean.java:2885)

The server should send the request asynchronously and never block. Ideally, the request could be made in a separate thread as well.

Similar issues exist in other methods: Uninventory comes to mind.


    @Override
    @TransactionAttribute(TransactionAttributeType.NEVER)
    public List<Integer> enableResources(Subject subject, int[] resourceIds) {
        // On a best effort basic, ask the relevant agents that their next avail report be full, so that we get
        // the current avail type for the newly enabled resources.  If we can't contact the agent don't worry about
        // it; if it's down we'll get a full report when it comes up.
        // TODO: This may need to be made out of band if perf becomes an issue.
        for (Agent agent : reports.keySet()) {

Version-Release number of selected component (if applicable): 4.9


How reproducible: Always.


Steps to Reproduce:
1. Create an agent that is running but does not reply to requests. (Basically create an agent that's running on the same port but won't reply to the server.) 
2. Attempt to enableResources on this agent.
3. Watch the thread hang.

Actual results: Thread hangs, database transactions hang.


Expected results: Should exit immediately.


Additional info:

Comment 1 Jay Shaughnessy 2014-01-09 22:34:56 UTC
master commit a7310e6262f142eeb6af1c84651f761826d3e6ce
Author: Jay Shaughnessy <jshaughn>
Date:   Thu Jan 9 17:33:00 2014 -0500

 I don't think this code held a DB transaction given the NEVER transaction
 attribute on the SLSB method. But, it could still hang a thread while trying
 to contact agents in-band to request full avail checks.  Move the agent
 requests out of band to assure a faster return and better scalability.

Comment 2 Heiko W. Rupp 2014-04-23 12:32:43 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.