Bug 1274155

Summary: Client keeps using old view after merging of split brains
Product: [JBoss] JBoss Data Grid 6 Reporter: Osamu Nagano <onagano>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: VERIFIED --- QA Contact: Martin Gencur <mgencur>
Severity: high Docs Contact:
Priority: high    
Version: 6.4.1CC: chuffman, galder.zamarreno, jdg-bugs, ksuzumur, myoshida, onagano, wfink
Target Milestone: ER3   
Target Release: 6.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When using the HotRod client in JBoss Data Grid's Remote Client-Server mode, there was a possibility that an outdated view was used when the cluster healed from network partition. This could lead to data inconsistencies when performing operations by clients that received different views. The issue is resolved as of Red Hat JBoss Data Grid 6.6.0. The HotRod client correctly receives an updated view after the partition is healed.
Story Points: ---
Clone Of:
: 1288354 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1288354    
Attachments:
Description Flags
hotrodclient.zip none

Description Osamu Nagano 2015-10-22 07:10:03 UTC
Description of problem:
After merging of split-brains, a client doesn't recognize the new cluster view and keeps using an old view.


Version-Release number of selected component (if applicable):
JDG 6.4.1 server and Java client (JDG 6.5.1 as well)


How reproducible:
Always


Steps to Reproduce:
1. Start 2 nodes cluster, 127.0.0.1:11222 and 127.0.1.1:11222 in this example.

2. Connect to 127.0.0.1:11222 and confirm that the current view has 2 members.
~~~
15:06:43,636 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=2) containing 2 addresses: [/127.0.1.1:11222, /127.0.0.1:11222]
15:06:43,637 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.1.1:11222), adding to the pool.
~~~
You can used the attached client as follows.
$ make compile download run
hoge> connect 127.0.0.1
hoge> get hoge (goes to the first server)
hoge> get buzz (goes to the second server)

3. Stop (Ctrl-z) 127.0.1.1:11222 and wait this member has been dropped.

4. Access the cluster and confirm the new view with 1 member received.
~~~
15:08:08,087 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=3) containing 1 addresses: [/127.0.0.1:11222]
15:08:08,088 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.1.1:11222), removing from the pool.
~~~

5. Restart (type "fg") 127.0.1.1:11222 and wait for merging.

6. Access to the cluster from the client but no views are received.


Expected results:
The new merged view should be received at step 6.


Additional info:
With the following log setting, you can observe which member is reached by a client.
~~~
            <console-handler name="CONSOLE">
                <level name="TRACE"/>
                ...
            <logger category="org.infinispan.interceptors.CallInterceptor">
                <level name="TRACE"/>
            </logger>
~~~

Comment 2 Osamu Nagano 2015-10-22 07:11:50 UTC
Created attachment 1085419 [details]
hotrodclient.zip

Comment 3 Galder Zamarreño 2015-10-27 09:57:51 UTC
I have tried this case with community Infinispan 8.1.0.Alpha1 and the issue is not there, so probably this was already fixed before? Have you tried with latest JDG version? :|

10:54:37,805 INFO  [com.example.HotRodClient] (main) connect called: + serverList
10:54:38,026 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=5) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:54:38,027 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
10:54:38,029 INFO  [org.infinispan.client.hotrod.RemoteCacheManager] (main) ISPN004021: Infinispan version: 8.1.0.Alpha1
10:54:38,029 INFO  [com.example.HotRodClient] (main) Connected.
10:54:38,111 INFO  [com.example.HotRodClient] (main) Selected cache:
hoge> get hoge
null
hoge> get buzz
null
hoge> get hoge
10:55:44,306 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=6) containing 1 addresses: [/127.0.0.1:11222]
10:55:44,307 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.0.1:12222), removing from the pool.
null
hoge> get buzz
null
hoge> get hoge
10:56:10,619 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=8) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:56:10,620 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
null
hoge> get buzz
null
hoge>

Comment 4 Galder Zamarreño 2015-10-27 12:39:50 UTC
I've re-run the test and I've been able to replicate it. I mixed up suspend and kill commands.

Comment 5 JBoss JIRA Server 2015-10-27 13:38:24 UTC
Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-5889 to Coding In Progress

Comment 6 Osamu Nagano 2015-10-28 01:41:46 UTC
@Galder, I've tested with JDG 6.4.1, JDG 6.5.1, and Infinispan 8.1.0.Alpha2 and all have the same behaviour.  Ctrl-z, not killing, is important to imitate a long GC pause.

This issue results in data inconsistency.  For example, a client which connects to the first server always receives 1-member view after the merge.  Any put operations, including a key which was directed to the second server originally, are directed to the first server.  While a client which connects to the second server receives 2-member view after the merge.  This client cannot read a value  of the key put by the former client.

Comment 8 Osamu Nagano 2015-11-13 01:20:41 UTC
PR #3798 has been merged to the infinispan:master.  I built and tested it but the issue in the description still remains.  Are there more work on the issue?

Comment 12 JBoss JIRA Server 2015-11-26 14:00:01 UTC
Dan Berindei <dberinde> updated the status of jira ISPN-5889 to Reopened

Comment 14 Dan Berindei 2015-11-27 18:38:57 UTC
PR: https://github.com/infinispan/jdg/pull/805

I've added a test method that does an "overlapping" merge. I've also tested with Ctrl+Z, and the client receives the 2nd node's address when it is resumed.

Comment 16 Matej Čimbora 2015-12-03 10:32:05 UTC
Tested using the provided application and reproduced the issue with ER2. The problem is no longer present in ER3. Marking as verified.