Customer is getting a NullPointerException from JDG during startup of a new node. Caused by: java.lang.NullPointerException at org.infinispan.container.InternalEntryFactoryImpl.create(InternalEntryFactoryImpl.java:62) See my comment on 8/24/2015 9:10 PM EDT for more details. It's only happening rarely in production, and extremely rarely in QA. ================================================================= When a node joins and data is rebalanced, transactions are sent to the new owner. But it does not include all of the transaction data. org.ininispan.interceptors.TxInterceptor#visitCommitCommand checks for this case, and replays the transaction prepare if needed. The only place I've found that this NPE should be occurring is if this occurs and the transaction needs to be replayed, but it doesn't get replayed for some reason. The latest logs show that it is *not* replaying any prepare before the exception occurs. I suspect there's something wrong with this check to see if it needs to replay the prepare. The following warning is occurring in close proximity to the exception, and may be related: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=dataMap, type=REBALANCE_START ... java.lang.IllegalArgumentException: Received a rebalance start topology ... while there already was a rebalance in progress: ... I believe we'll need to trace the topology IDs for prepare and commit commands, and the state transfer, to see if there's a discrepancy causing the topology ID check in TxInterceptor#visitCommitCommand to not work correctly.
That part of the code has changed after 6.3.2, so we'd need the customer and GSS to validate this with a current release (6.5.x).
The issue is not reproducable (JDG 6.5.1) and there are no further informations or details how to reproduce it. Also no logfiles with details (TRACE/DEBUG) during the failure.