Bug 1199210
Summary: | Backup segments may not be removed after failed state transfer | ||||||
---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Tristan Tarrant <ttarrant> | ||||
Component: | Infinispan | Assignee: | Dan Berindei <dberinde> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Gencur <mgencur> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3.0 | CC: | dberinde, jdg-bugs, mcimbora, onagano, pzapataf, tkimura | ||||
Target Milestone: | ER3 | ||||||
Target Release: | 6.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-06-23 12:24:19 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1214804, 1203121 | ||||||
Attachments: |
|
Description
Tristan Tarrant
2015-03-05 16:18:26 UTC
Osamu, can you confirm if the problem still occurs with 6.4.0.Final? Based on the stack trace, it looks like the IllegalArgumentException was eliminated with the fix for bug 1155606 (ISPN-4852). (Filtered stack trace below) 2014/07/31:11:16:11,110 WARN [org.infinispan.topology.CacheTopologyControlCommand] (remote-thread-931) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=cache, type=CH_UPDATE, sender=node01, joinInfo=null, topologyId=61, currentCH=DefaultConsistentHash{numSegments=240, numOwners=2, members=[...]}, pendingCH=null, throwable=null, viewId=32}: java.lang.IllegalArgumentException: Node node04 is not a member at org.infinispan.distribution.ch.DefaultConsistentHash.getSegmentsForOwner(DefaultConsistentHash.java:87) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:457) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:199) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:39) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:104) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.topology.LocalTopologyManagerImpl.handleConsistentHashUpdate(LocalTopologyManagerImpl.java:201) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:152) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:124) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:270) [infinispan-core-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_65] Testing in JDG 6.4.0 will take some time because they need to set up a large cluster and this is not easily reproducible. Instead, I created Bug 1203121 which is focusing on the customer's real use case using getBulk. They will test at the same time with the new JDG after Bug 1203121 gets fixed. The ISPN-4852 fix avoids the IllegalArgumentException by replacingthe direct call to DefaultConsistentHash.getSegmentsForOwner() with a call to getOwnedSegments(). Created attachment 1004278 [details]
another_error.log
Thanks, Dan. IllegalArgumentException case seems to be actually fixed. We went through the log files of failed state transfer again, and found another case, OutdatedTopologyException. This is very rare than IllegalArgumentException case, but on the same critical path of this issue. I attached the full stack trace as another_error.log. Note that the code path after org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate() is different. Is this fixed in JDG 6.4.0 too? ~~~ 2015-02-02 17:45:10.461 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (remote-thread-180934) ISPN000136: Execution error: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from AAAA/clustered(s1), see cause for remote stack trace at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:41) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] ... at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:433) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:199) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:39) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:104) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.topology.LocalTopologyManagerImpl.handleConsistentHashUpdate(LocalTopologyManagerImpl.java:201) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:152) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:124) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:270) [infinispan-core-6.1.1.Final-redhat-5.jar:6.1.1.Final-redhat-5] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_55] at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_55] Caused by: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 68, got 69 ... ~~~ Osamu, that exception is harmless, it shouldn't really be logged as ERROR. The stack trace does point to another problem with the topology update being blocked in the BlockingTaskAwareExecutorService.checkForReadyTasks() call: ISPN-5317. This can delay the rebalance confirmation, thus making the rebalance longer, and even a lot longer if there were a lot of commands waiting for the new topology. The workaround, as always, is to increase the number of thread in the remote-executor thread pool... Dan Berindei <dberinde> updated the status of jira ISPN-5268 to Resolved The IllegalArgumentException appeared because the node had a previous consistent hash, but it was not a member of it (perhaps after a merge). The ISPN-4852 fix avoids the exception by replacing the direct call to DefaultConsistentHash.getSegmentsForOwner() with a call to StateConsumerImpl.getOwnedSegments(). |