Bug 724212 (BRMS-155)

Summary: CLONE -Deadlock when RuleAgent thread refreshes rules while another thread creates a statefulSession
Product: [JBoss] JBoss Enterprise BRMS Platform 5 Reporter: nwallace <nwallace>
Component: unspecifiedAssignee: Mark Proctor <mark.proctor>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: high    
Version: unspecified   
Target Milestone: ---   
Target Release: 5.0.1   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/BRMS-155
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Windows XP SP3 Sun JRE 1.5.0_14
Last Closed: 2009-09-01 12:18:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description nwallace 2009-07-07 12:59:44 UTC
Date of First Response: 2009-09-10 00:43:16
securitylevel_name: Public

I believe I have discovered a deadlock that can occur in drools 4.0.7 if the RuleAgent refreshes its associated RuleBase whilst a new stateful session is being created on the RuleBase by another thread. 

The thread dump below shows the deadlock where the two thread "Timer-15" and "DataSource(com.acme.Source)-2" are deadlocked. Thread "Timer-15" is the timer thread created by the RuleAgent rule refresh mechanism to check if the rules files have changed, and to refresh the rules when a change is found. If it finds changes to the rules then it obtains a lock (<0x189a8d88> (a java.util.HashMap)) and proceeds to removes the old version of the changed package from the RuleBase. To do this it needs to obtain another lock (<0x189a2ba0> (a org.drools.reteoo.ReteooRuleBase)) before it can call removeRule.

However, in another thread "DataSource(com.acme.Source)-2" a new stateful session is being created on the same RuleBase. This has already obtained the lock  (<0x189a8d88> (a java.util.HashMap)) that the Timer thread is waiting for, and is itself waiting for the another lock <0x189a2ba0> (a org.drools.reteoo.ReteooRuleBase) that the Timer thread has already locked, hence the deadlock.

In my own application I coded around this by eventually not using the RuleAgent in-built refresh mechanism, but instead periodically calling refreshRuleBase() on the RuleAgent in the SAME thread used to create the Stateful session, thus avoiding any deadlock.


"Timer-15" daemon prio=6 tid=0x02f4b0e8 nid=0x1864 waiting for monitor entry [0x038cf000..0x038cfa68]
	at org.drools.reteoo.ReteooRuleBase.removeRule(ReteooRuleBase.java:270)
	- waiting to lock <0x189a2ba0> (a org.drools.reteoo.ReteooRuleBase)
	at org.drools.common.AbstractRuleBase.removeRule(AbstractRuleBase.java:656)
	at org.drools.common.AbstractRuleBase.removePackage(AbstractRuleBase.java:570)
	- locked <0x189a8d88> (a java.util.HashMap)
	at org.drools.agent.PackageProvider.removePackage(PackageProvider.java:45)
	at org.drools.agent.PackageProvider.applyChanges(PackageProvider.java:63)
	at org.drools.agent.RuleAgent.refreshRuleBase(RuleAgent.java:320)
	at org.drools.agent.RuleAgent$2.run(RuleAgent.java:438)
	at java.util.TimerThread.mainLoop(Unknown Source)
	at java.util.TimerThread.run(Unknown Source)



"DataSource(com.acme.Source)-2" daemon prio=4 tid=0x03027610 nid=0xfac waiting for monitor entry [0x0378f000..0x0378fb68]
	at org.drools.reteoo.ReteooRuleBase.newStatefulSession(ReteooRuleBase.java:225)
	- waiting to lock <0x189a8d88> (a java.util.HashMap)
	- locked <0x189a2ba0> (a org.drools.reteoo.ReteooRuleBase)
	at org.drools.common.AbstractRuleBase.newStatefulSession(AbstractRuleBase.java:284)
	at com.acme.RunRules.flush(RunRules.java:3337)
	at com.acme.ControlThread.run(ControlThread.java:465)
	at java.lang.Thread.run(Unknown Source)

Comment 1 nwallace 2009-07-07 13:01:00 UTC
Link: Added: This issue is related to JBRULES-1876


Comment 2 nwallace 2009-09-01 12:18:47 UTC
Fix in place.

Comment 3 David Le Sage 2009-09-10 04:43:16 UTC
For documenting this in the Release Notes, can you please confirm the following and fill in the missing information. Dot point explanations are fine:

The CAUSE (what was actually broken)
 *  A deadlock would occur in Drools
     4.0.7 if the RuleAgent refreshed its
     associated RuleBase whilst a new
     stateful session was being created by another thread on
     that same RuleBase.

CONSEQUENCES of the bug (how it impacts users.)
 * This would result in a deadlock. 


The FIX (what was changed to eliminate this bug) and 
 *

RESULTS of the fix (what now happens for users.)
 * The error no longer occurs???



Comment 4 David Le Sage 2009-09-23 05:31:17 UTC
We are still awaiting the outstanding information for the Release Notes on this one.  Please provide it as soon as possible. Thanks.

Comment 5 Dana Mison 2009-10-05 05:47:18 UTC
added to the 5.0.CP01 release notes as resolved:

JBRULES-1876
The RuleAgent can now safely refresh its associated RuleBase whilst a new stateful session is being created on the RuleBase by another thread. Previously this could result in a deadlock.