Bug 1190001

Summary: Avoid invalid topology
Product: [JBoss] JBoss Data Grid 6 Reporter: Takayoshi Kimura <tkimura>
Component: ServerAssignee: Galder Zamarreño <galder.zamarreno>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: high Docs Contact:
Priority: medium    
Version: 6.3.1CC: anistor, chuffman, dmehra, dstahl, jdg-bugs, ksuzumur, mcimbora, onagano, pzapataf, rmarwaha, slaskawi, ttarrant, wfink
Target Milestone: CR1   
Target Release: 6.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In previous versions of Red Hat JBoss Data Grid, when clients make requests to the server while there's a topology view change ongoing the server may send back partial topology updates; this behavior results in clients talking to the server with a suboptimal view which may result in segments having no owners. Now if a segment is found to have no owners the first node of the topology is sent as a segment owner with the fully formed topology being received by clients once the cluster has stabilized.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-02 12:14:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer none

Description Takayoshi Kimura 2015-02-06 03:31:18 UTC
We've seen some invalid topology propagated to client and it causes ArrayIndexOutOfBoundsException:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
  	at org.infinispan.client.hotrod.impl.transport.tcp.RoundRobinBalancingStrategy.getServerByIndex(RoundRobinBalancingStrategy.java:68) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.transport.tcp.RoundRobinBalancingStrategy.nextServer(RoundRobinBalancingStrategy.java:44) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.nextServer(TcpTransportFactory.java:220) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getTransport(TcpTransportFactory.java:194) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.operations.FaultTolerantPingOperation.getTransport(FaultTolerantPingOperation.java:27) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:48) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.impl.RemoteCacheImpl.ping(RemoteCacheImpl.java:535) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.RemoteCacheManager.ping(RemoteCacheManager.java:635) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.RemoteCacheManager.createRemoteCache(RemoteCacheManager.java:616) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
  	at org.infinispan.client.hotrod.RemoteCacheManager.getCache(RemoteCacheManager.java:527) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]
    at org.infinispan.client.hotrod.RemoteCacheManager.getCache(RemoteCacheManager.java:523) [infinispan-client-hotrod-6.1.0.Final-redhat-4.jar:6.1.0.Final-redhat-4]

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at org.infinispan.client.hotrod.impl.consistenthash.SegmentConsistentHash.getServer(SegmentConsistentHash.java:33)
    at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getTransport(TcpTransportFactory.java:204)
    at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.getTransport(AbstractKeyOperation.java:40)
    at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:48)
    at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:237)
    at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79)
    at sample.Main.main(Main.java:16)

It happens on both Hot Rod 2 and 1.3 clients.

It's really hard to reproduce this state and we don't have a consistent way to reproduce it. However when this happens there is always view change happening so it's related to view change.

Judging from the stack trace, the client receives numOwners=0 or numSegments=0 topology from the server.

Also we are unable to find to recover this situation. Rebooting random nodes don't help and keep getting this exceptions on client side.

Until we can find the root cause, I think it's better to add a guard to avoid this kind invalid topology stored in the server side and propagated to the clients.

Comment 6 JBoss JIRA Server 2015-03-13 13:01:55 UTC
Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-5208 to Coding In Progress

Comment 8 Matej Čimbora 2015-03-23 07:42:43 UTC
Problem still persists, i.e. I'm getting exception as described in linked JIRA.

java.lang.ArrayIndexOutOfBoundsException: 0
	at org.infinispan.client.hotrod.impl.consistenthash.SegmentConsistentHash.getServer(SegmentConsistentHash.java:33)
...

I managed to create reproducer for this (attached).

1. In clustered.xml, modify numOwners attribute for 'default' cache to 1.
2. Start 2 servers with clustered.xml configuration. 
3. Start attached reproducer (client).
4. Kill one of the servers.
5. Start it again. Exception should appear in client log.

This probably relates to losing segments for a subset of keys.

Comment 9 Matej Čimbora 2015-03-23 07:44:44 UTC
Created attachment 1005229 [details]
Reproducer

Comment 10 Sebastian Łaskawiec 2015-03-23 15:07:51 UTC
PR: https://github.com/infinispan/jdg/pull/575