+++ This bug was initially created as a clone of Bug #1207436 +++ Clone of https://issues.jboss.org/browse/DROOLS-751
Fixed on master by the following commits drools: https://github.com/droolsjbpm/drools/commit/f79eadd91 https://github.com/droolsjbpm/drools/commit/a8d64d1871150b95dcc6d3fc3e2353a2b7d226d5 https://github.com/droolsjbpm/drools/commit/bd17466011f84a2572771c04d0d1152a7a7b0313 jbpm: https://github.com/droolsjbpm/jbpm/commit/e1bc031028759714edf30d41fce7dcda1118f3cb https://github.com/droolsjbpm/jbpm/commit/87c5b6f122f84022f90d3fb2b595aaef8269fb34
Hi Mario, this BZ is for 6.2.x branch (product 6.1). Could you please backport it ? Thanks
Cherry-picked to 6.2.x with the following commits drools: https://github.com/droolsjbpm/drools/commit/8b0ee2e17 jbpm: https://github.com/droolsjbpm/jbpm/commit/885e2f4cc
Mario Fusco <mario.fusco> updated the status of jira DROOLS-751 to Resolved
Mario, please see this PR: https://github.com/droolsjbpm/drools/pull/436 There is a test which started to fail after adding the propagation queue. I ran it also against 6.2.x in the state just before adding the propagation queue commit, and the test succeeded. Tibor is looking for other possible issues, so I am not asetting the BZ to ASSIGNED yet.
I have found that if you insert another fireAllRules() call into the test (after event is inserted and before the clock is advanced), the tests passes. Is this intended? If so, we should probably mention it in release notes because it might break users' code.
Mario, I have another PR: https://github.com/droolsjbpm/drools/pull/437 It's a test that ends with NPE when many threads insert and delete same fact(s) concurrently. Something is still wrong with session concurrency. I tried also synchronizing whole flush method in SynchronizedPropagationQueue, but it didn't help.
Mario, I have found another suspicious test related to fireUntilHalt() with timer and accumulate, seems a concurrency issue as it fails roughly 50 % of runs, could you please take a look at [1]? It usually fails when running the same test code several times (2 or 3 times), I have updated the test in [1] to make it more frequent. Is there anything that I have overlooked? Thanks! [1] https://github.com/droolsjbpm/drools/pull/440
Because of found issues (see comments before), changing status of this ticket to ASSIGNED.
Fix cherry-picked to 6.2.x branch by https://github.com/droolsjbpm/drools/commit/d2842bff0
The test case in this PR https://github.com/droolsjbpm/drools/pull/440/files is still not working. I need to investigate why.
Marek, I chatted about this with Mark and we agreed that the test is non-deterministic given how it is written. I am setting the status of this ticket back to MODIFIED, but feel free to ping me on IRC if you need further clarifications about this. If it's ok for your I'll also close that PR. Let me know.
We have spoken with Mario and found a way to make the test work - the single advancing of pseudo-clock clock.advanceTime(40, TimeUnit.SECONDS); needs to be replaced with for (int count=0; count < 40; count++) { clock.advanceTime(1, TimeUnit.SECONDS); } and the test passes. To sum it up, the test is ok for me, users just need to be aware of this behaviour when using pseudo-clock.
Verified running community tests suite, performance tests suite and all QE regression tests suites.
Assgined back to Mario backport the addtional fix described at: https://issues.jboss.org/browse/DROOLS-751?focusedCommentId=13079092&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13079092 ... to be part of BxMS 6.1 Update 3
Sorry, I mean Update "2".
Additional fix cherry-picked to 6.2.x branch with https://github.com/droolsjbpm/drools/commit/dc222710e
Verified that the additional fix is present in 6.1.2.CR1 and there is no regression caught by regression tests. Leaving the final verification up to Tibor, since he had been investigating this BZ in detail before.
Thanks Marek. I will leave verification of this issue to CR2 build, because this needs that our QE community engine tests are rerun in full range with backported code (these are tests from community, but they need to be run in our different test environments). But the changes look all right.
Verified running QE community tests suite and regression tests suite.
I'm sorry, but I must reopen this issue, because after further examination I found a possible problem with backported changes. Reproducer can be found here [1]. It looks like that there's some bad race condition when Timers are involved. The test sometimes fails on second test iteration. From what I examined, the iteration calls SynchronizedPropagationList.halt() method and there's no call to notifyAll() later, so the queue processing is halted till the end of the test. However, it fails not in deterministic way, so sometimes the test passes. [1] https://github.com/droolsjbpm/drools/pull/462
Fixed on master by https://github.com/droolsjbpm/drools/commit/4ed10fac9 and cherry-picked to 6.2.x branch with https://github.com/droolsjbpm/drools/commit/9686624cf
Found another possible problem... here's the reproducer [1] There are two options for obtaining KieSession in the reproducer. When KieSession is obtained from RuntimeEngine, no rules are fired. When it is obtained from KieContainer, everything works fine. The problem is only with fireUntilHalt, fireAllRules works correctly. It is highly possible that this is related to changes in this BZ, but I'm keeping this ON_QA so Mario can investigate this a bit and if it is really a problem with changes from this BZ, I'll return this as ASSIGNED. [1] http://pastebin.com/3n1TP4Vn
Assigning back. Mario found that there is still some sync issue (force eager activation + fireUntilHalt).
Tibor I talked to Mario and Kris and such use case does not seem valid. jBPM engine is not prepared to run in fireUntilHalt and many results can be unpredictable. Starting executions by RuntimeEngine is only relevant when using process flow, and as it plus fireUntilHalt is not a good combination so this use case seems not valid. Is there any other consideration about this issue, or we are fine to get it as VERIFIED so ?
If it is so that fireUntilHalt is not relevant in this case, it should somehow inform the user about it. It should throw UnsupportedOperationException or something similar. I know that it is part of KieSession impl, but I hope this can be somehow done. What do you think? If you need to talk about it more, please ping me on IRC (tzimanyi).
Verifying this issue, see new BZ [1] that contains things from comment #32. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247913
Sorry, I made a mistake, I wanted to verify it when I check our community tests execution. It will be today.. In previous comment I wanted to state that the issue with fireUntilHalt is not a blocker for this issue so I created separate BZ for that. Sorry once more for a mistake, putting this to ON_QA.
I found 2 issues running community tests on different certified environments that can be on first sight related to this issue [1] [2]. But related is only the second one [2]. It is in ReteOO part of the engine, so after discussion with Mario I decided that it is not a blocker for this issue, but still needs investigating. So I'm verifying this issue, the other issue should be fixed separately. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1248661 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1248667
I fixed the force eager activation + fireUntilHalt problem with this commit https://github.com/droolsjbpm/drools/commit/0d93b3b2d For now this is only on master. Alessandro, let me know if it should be backported to 6.2.x branch.
Alessandro, the fix reported at comment 36 has been already cherry-picked to 6.2.x branch for what requested here https://bugzilla.redhat.com/show_bug.cgi?id=1258909
Fix to regression reported in https://issues.jboss.org/browse/DROOLS-921 cherry-picked to 6.2.x branch with this commit https://github.com/droolsjbpm/drools/commit/cca1ba061
Verified by running community tests from mentioned commits on supported environments. All new tests passed.