Bug 1218291 - [ENG] (6.1.z) Different threads accessing LeftTuple fields that reference LeftTupleSet
Summary: [ENG] (6.1.z) Different threads accessing LeftTuple fields that reference Lef...
Keywords:
Status: CLOSED EOL
Alias: None
Product: JBoss BRMS Platform 6
Classification: Retired
Component: BRE
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: CR1
: 6.1.0
Assignee: Mario Fusco
QA Contact: Tibor Zimanyi
URL:
Whiteboard:
Depends On: 1207436
Blocks: 1220521 1220522 1230818 1230825 1259378 1259382
TreeView+ depends on / blocked
 
Reported: 2015-05-04 14:08 UTC by Alessandro Lazarotti
Modified: 2020-03-27 19:08 UTC (History)
10 users (show)

Fixed In Version:
Clone Of: 1207436
Environment:
Last Closed: 2020-03-27 19:08:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker DROOLS-751 0 Critical Resolved Different threads accessing LeftTuple fields that reference LeftTupleSet 2017-04-06 13:16:02 UTC
Red Hat Issue Tracker DROOLS-921 0 Major Resolved Drools fires no rules with this Drools/CEP test 2017-04-06 13:16:02 UTC

Description Alessandro Lazarotti 2015-05-04 14:08:24 UTC
+++ This bug was initially created as a clone of Bug #1207436 +++

Clone of https://issues.jboss.org/browse/DROOLS-751

Comment 3 Alessandro Lazarotti 2015-05-07 13:45:37 UTC
Hi Mario, this BZ is for 6.2.x branch (product 6.1). Could you please backport it ?
Thanks

Comment 4 Mario Fusco 2015-05-07 14:02:28 UTC
Cherry-picked to 6.2.x with the following commits

drools:

https://github.com/droolsjbpm/drools/commit/8b0ee2e17

jbpm:

https://github.com/droolsjbpm/jbpm/commit/885e2f4cc

Comment 5 JBoss JIRA Server 2015-05-13 13:06:29 UTC
Mario Fusco <mario.fusco> updated the status of jira DROOLS-751 to Resolved

Comment 6 Marek Winkler 2015-05-20 13:04:25 UTC
Mario, please see this PR: https://github.com/droolsjbpm/drools/pull/436

There is a test which started to fail after adding the propagation queue. I ran it also against 6.2.x in the state just before adding the propagation queue commit, and the test succeeded.

Tibor is looking for other possible issues, so I am not asetting the BZ to ASSIGNED yet.

Comment 7 Marek Winkler 2015-05-21 07:51:19 UTC
I have found that if you insert another fireAllRules() call into the test (after event is inserted and before the clock is advanced), the tests passes.

Is this intended? If so, we should probably mention it in release notes because it might break users' code.

Comment 8 Tibor Zimanyi 2015-05-21 11:39:46 UTC
Mario, I have another PR: https://github.com/droolsjbpm/drools/pull/437

It's a test that ends with NPE when many threads insert and delete same fact(s) concurrently. Something is still wrong with session concurrency. I tried also synchronizing whole flush method in SynchronizedPropagationQueue, but it didn't help.

Comment 11 Marek Winkler 2015-05-22 11:48:59 UTC
Mario,

I have found another suspicious test related to fireUntilHalt() with timer and accumulate, seems a concurrency issue as it fails roughly 50 % of runs, could you please take a look at [1]? It usually fails when running the same test code several times (2 or 3 times), I have updated the test in [1] to make it more frequent.

Is there anything that I have overlooked? Thanks!

[1] https://github.com/droolsjbpm/drools/pull/440

Comment 14 Tibor Zimanyi 2015-05-25 10:23:09 UTC
Because of found issues (see comments before), changing status of this ticket to ASSIGNED.

Comment 15 Mario Fusco 2015-05-25 15:32:15 UTC
Fix cherry-picked to 6.2.x branch by https://github.com/droolsjbpm/drools/commit/d2842bff0

Comment 16 Mario Fusco 2015-05-26 13:53:41 UTC
The test case in this PR https://github.com/droolsjbpm/drools/pull/440/files is still not working. I need to investigate why.

Comment 17 Mario Fusco 2015-05-26 17:47:18 UTC
Marek, 

I chatted about this with Mark and we agreed that the test is non-deterministic given how it is written. 

I am setting the status of this ticket back to MODIFIED, but feel free to ping me on IRC if you need further clarifications about this.

If it's ok for your I'll also close that PR. Let me know.

Comment 18 Marek Winkler 2015-05-27 08:57:05 UTC
We have spoken with Mario and found a way to make the test work - the single advancing of pseudo-clock 

  clock.advanceTime(40, TimeUnit.SECONDS);

needs to be replaced with 

  for (int count=0; count < 40; count++) {
    clock.advanceTime(1, TimeUnit.SECONDS);
  }

and the test passes.

To sum it up, the test is ok for me, users just need to be aware of this behaviour when using pseudo-clock.

Comment 19 Tibor Zimanyi 2015-06-03 10:25:11 UTC
Verified running community tests suite, performance tests suite and all QE regression tests suites.

Comment 20 Alessandro Lazarotti 2015-06-17 21:51:42 UTC
Assgined back to Mario backport the addtional fix described at:

https://issues.jboss.org/browse/DROOLS-751?focusedCommentId=13079092&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13079092

... to be part of BxMS 6.1 Update 3

Comment 21 Alessandro Lazarotti 2015-06-17 21:53:28 UTC
Sorry, I mean Update "2".

Comment 22 Mario Fusco 2015-06-29 08:47:45 UTC
Additional fix cherry-picked to 6.2.x branch with https://github.com/droolsjbpm/drools/commit/dc222710e

Comment 23 Marek Winkler 2015-07-10 10:57:09 UTC
Verified that the additional fix is present in 6.1.2.CR1 and there is no regression caught by regression tests.

Leaving the final verification up to Tibor, since he had been investigating this BZ in detail before.

Comment 24 Tibor Zimanyi 2015-07-13 12:15:42 UTC
Thanks Marek. I will leave verification of this issue to CR2 build, because this needs that our QE community engine tests are rerun in full range with backported code (these are tests from community, but they need to be run in our different test environments). But the changes look all right.

Comment 25 Tibor Zimanyi 2015-07-17 12:46:58 UTC
Verified running QE community tests suite and regression tests suite.

Comment 26 Tibor Zimanyi 2015-07-20 11:52:28 UTC
I'm sorry, but I must reopen this issue, because after further examination I found a possible problem with backported changes. 

Reproducer can be found here [1].

It looks like that there's some bad race condition when Timers are involved. The test sometimes fails on second test iteration. From what I examined, the iteration calls SynchronizedPropagationList.halt() method and there's no call to notifyAll() later, so the queue processing is halted till the end of the test. However, it fails not in deterministic way, so sometimes the test passes. 

[1] https://github.com/droolsjbpm/drools/pull/462

Comment 27 Mario Fusco 2015-07-21 12:30:00 UTC
Fixed on master by https://github.com/droolsjbpm/drools/commit/4ed10fac9 and cherry-picked to 6.2.x branch with https://github.com/droolsjbpm/drools/commit/9686624cf

Comment 28 Tibor Zimanyi 2015-07-28 10:11:03 UTC
Found another possible problem... here's the reproducer [1]

There are two options for obtaining KieSession in the reproducer. When KieSession is obtained from RuntimeEngine, no rules are fired. When it is obtained from KieContainer, everything works fine. The problem is only with fireUntilHalt, fireAllRules works correctly. 

It is highly possible that this is related to changes in this BZ, but I'm keeping this ON_QA so Mario can investigate this a bit and if it is really a problem with changes from this BZ, I'll return this as ASSIGNED. 

[1] http://pastebin.com/3n1TP4Vn

Comment 29 Tibor Zimanyi 2015-07-28 14:06:29 UTC
Assigning back. Mario found that there is still some sync issue (force eager activation + fireUntilHalt).

Comment 31 Alessandro Lazarotti 2015-07-28 15:37:16 UTC
Tibor I talked to Mario and Kris and such use case does not seem valid.
jBPM engine is not prepared to run in fireUntilHalt and many results can be unpredictable.

Starting executions by RuntimeEngine is only relevant when using process flow, and as it plus fireUntilHalt is not a good combination so this use case seems not valid.

Is there any other consideration about this issue, or we are fine to get it as VERIFIED so ?

Comment 32 Tibor Zimanyi 2015-07-29 07:18:39 UTC
If it is so that fireUntilHalt is not relevant in this case, it should somehow inform the user about it. It should throw UnsupportedOperationException or something similar. I know that it is part of KieSession impl, but I hope this can be somehow done. What do you think? If you need to talk about it more, please ping me on IRC (tzimanyi).

Comment 33 Tibor Zimanyi 2015-07-29 08:42:29 UTC
Verifying this issue, see new BZ [1] that contains things from comment #32. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1247913

Comment 34 Tibor Zimanyi 2015-07-29 08:51:42 UTC
Sorry, I made a mistake, I wanted to verify it when I check our community tests execution. It will be today.. In previous comment I wanted to state that the issue with fireUntilHalt is not a blocker for this issue so I created separate BZ for that. Sorry once more for a mistake, putting this to ON_QA.

Comment 35 Tibor Zimanyi 2015-07-30 14:39:24 UTC
I found 2 issues running community tests on different certified environments that can be on first sight related to this issue [1] [2]. But related is only the second one [2]. It is in ReteOO part of the engine, so after discussion with Mario I decided that it is not a blocker for this issue, but still needs investigating. So I'm verifying this issue, the other issue should be fixed separately. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1248661
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1248667

Comment 36 Mario Fusco 2015-07-31 09:33:09 UTC
I fixed the force eager activation + fireUntilHalt problem with this commit https://github.com/droolsjbpm/drools/commit/0d93b3b2d

For now this is only on master. 
Alessandro, let me know if it should be backported to 6.2.x branch.

Comment 38 Mario Fusco 2015-09-25 14:19:19 UTC
Alessandro, the fix reported at comment 36 has been already cherry-picked to 6.2.x branch for what requested here https://bugzilla.redhat.com/show_bug.cgi?id=1258909

Comment 39 Mario Fusco 2015-09-25 14:48:37 UTC
Fix to regression reported in https://issues.jboss.org/browse/DROOLS-921 cherry-picked to 6.2.x branch with this commit https://github.com/droolsjbpm/drools/commit/cca1ba061

Comment 40 Tibor Zimanyi 2015-10-09 13:22:34 UTC
Verified by running community tests from mentioned commits on supported environments. All new tests passed.


Note You need to log in before you can comment on or make changes to this bug.