Bug 1249261

Summary: [QE](6.1.z) NullPointerException on LeftTupleIndexHashTable.remove()
Product: [Retired] JBoss BRMS Platform 6 Reporter: Alessandro Lazarotti <alazarot>
Component: BREAssignee: Mario Fusco <mfusco>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Tibor Zimanyi <tzimanyi>
Severity: unspecified Docs Contact:
Priority: high    
Version: 6.1.0CC: kverlaen, mfusco, rrajasek, tzimanyi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1248661 Environment:
Last Closed: 2015-09-28 17:56:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1248661    
Bug Blocks:    

Description Alessandro Lazarotti 2015-08-01 01:02:57 UTC
+++ This bug was initially created as a clone of Bug #1248661 +++

Description of problem: 

LeftTupleIndexHashTable.remove() operation ended on NPE when called in test DinnerPartyPerformanceTest.solveModel_wedding01() from optaplanner community repo. I cannot reproduce it, so it's probably some specific case of add, remove operations combination. We can probably exclude concurrency from possible causes because optaplanner is single-threaded. I'll try to create a reproducer for this, but it's a tricky stuff, so it will take longer. Here's the stack trace: 

java.lang.NullPointerException: null
	at org.drools.core.util.index.LeftTupleIndexHashTable.remove(LeftTupleIndexHashTable.java:386)
	at org.drools.core.phreak.PhreakNotNode.doRightUpdates(PhreakNotNode.java:334)
	at org.drools.core.phreak.PhreakNotNode.doNode(PhreakNotNode.java:57)
	at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:526)
	at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:507)
	at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:319)
	at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:149)
	at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:106)
	at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:161)
	at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:57)
	at org.drools.core.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:987)
	at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1301)
	at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1286)
	at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1259)
	at org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector.calculateScore(DroolsScoreDirector.java:87)
	at org.optaplanner.core.impl.solver.scope.DefaultSolverScope.calculateScore(DefaultSolverScope.java:110)
	at org.optaplanner.core.impl.phase.scope.AbstractPhaseScope.calculateScore(AbstractPhaseScope.java:123)
	at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.processMove(LocalSearchDecider.java:162)
	at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.doMove(LocalSearchDecider.java:149)
	at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.decideNextStep(LocalSearchDecider.java:121)
	at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:66)
	at org.optaplanner.core.impl.solver.DefaultSolver.runPhases(DefaultSolver.java:213)
	at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:176)
	at org.optaplanner.examples.common.app.SolverPerformanceTest.solve(SolverPerformanceTest.java:79)
	at org.optaplanner.examples.common.app.SolverPerformanceTest.runSpeedTest(SolverPerformanceTest.java:63)
	at org.optaplanner.examples.common.app.SolverPerformanceTest.runSpeedTest(SolverPerformanceTest.java:58)
	at org.optaplanner.examples.dinnerparty.app.DinnerPartyPerformanceTest.solveModel_wedding01(DinnerPartyPerformanceTest.java:46)


Version-Release number of selected component (if applicable): 6.2.0.Final-redhat-9

Additional info: 

Test failed running on aix7 environment with ibm jdk 7

--- Additional comment from JBoss Product and Program Management on 2015-07-30 10:30:17 EDT ---

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 2 Mario Fusco 2015-08-12 13:16:29 UTC
The optaplanner-based reproducer is non-deterministic, very likely because the domain objects used in it lack of the hashCode() method and then objects are hashed and iterated in a way that depends on their memory location.

In other terms 2 subsequent runs of that reproducer always have a different outcomes making impossible the investigation of this problem. Tibor is working to provide a deterministic reproducer.

Comment 3 Tibor Zimanyi 2015-08-14 14:25:27 UTC
Adding appropriate hashCode() methods to POJOs that are used in the failing Optaplanner test didn't help with providing us a reproducer for this issue. Further investigation also didn't help (using TruthMaintenanceTest and other test cases that produce the same stack trace as the stacktrace from bug description). 

Because this issue occurred only in a test in Jenkins environment (aix7 with ibm jdk7) and we still wasn't able to reproduce it on local environments, I propose that we move this bug to another later patch. From my investigation it looks like a very rare occurring bug that is connected with some hashing problem. The main point of focus from my point of view can be LeftTupleIndexHashTable class or it's parent class AbstractHashTable. But this should be investigated further.

Comment 4 Mario Fusco 2015-09-28 17:56:43 UTC
We are still unable to reproduce the reported problem so I'm closing this issue for now. Feel free to reopen it if you can provide a reproducer.

Comment 5 Tibor Zimanyi 2015-10-15 12:01:08 UTC
It looks like that this can be related to some IBM JDK 7 garbage collection issue. Because we experienced another strange NPE during testing on the same machine on which this fail occured (AIX 7 machine with IBM JDK 7.0). See here [1]. NPE occured on line [2] on which it shouldn't, because when you look into the code, the iterated map there is initialized only in constructor and there is never null assigned to it. Also it allows null keys, so get() method used on that line cannot produce NPE. 

Suspicious IBM JDK bug, that can cause this strange behaviour is this [3]. We will have a new IBM JDK 7 version installed on that machine in coming days, so we will see if these kind of fails still occur there after that. 

[1] http://pastebin.com/UkvH5J1T
[2] https://github.com/droolsjbpm/optaplanner/blob/6.2.x/optaplanner-core/src/main/java/org/optaplanner/core/impl/domain/variable/listener/VariableListenerSupport.java#L144
[3] http://www-01.ibm.com/support/docview.wss?uid=swg1IV74133