This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1274155 - Client keeps using old view after merging of split brains
Client keeps using old view after merging of split brains
Status: VERIFIED
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
6.4.1
Unspecified Unspecified
high Severity high
: ER3
: 6.6.0
Assigned To: Dan Berindei
Martin Gencur
:
Depends On:
Blocks: 1288354
  Show dependency treegraph
 
Reported: 2015-10-22 03:10 EDT by Osamu Nagano
Modified: 2016-01-22 10:54 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When using the HotRod client in JBoss Data Grid's Remote Client-Server mode, there was a possibility that an outdated view was used when the cluster healed from network partition. This could lead to data inconsistencies when performing operations by clients that received different views. The issue is resolved as of Red Hat JBoss Data Grid 6.6.0. The HotRod client correctly receives an updated view after the partition is healed.
Story Points: ---
Clone Of:
: 1288354 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hotrodclient.zip (6.28 KB, application/zip)
2015-10-22 03:11 EDT, Osamu Nagano
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-5889 Major Resolved Merge views not dealt with in Hot Rod server 2017-03-21 12:08 EDT

  None (edit)
Description Osamu Nagano 2015-10-22 03:10:03 EDT
Description of problem:
After merging of split-brains, a client doesn't recognize the new cluster view and keeps using an old view.


Version-Release number of selected component (if applicable):
JDG 6.4.1 server and Java client (JDG 6.5.1 as well)


How reproducible:
Always


Steps to Reproduce:
1. Start 2 nodes cluster, 127.0.0.1:11222 and 127.0.1.1:11222 in this example.

2. Connect to 127.0.0.1:11222 and confirm that the current view has 2 members.
~~~
15:06:43,636 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=2) containing 2 addresses: [/127.0.1.1:11222, /127.0.0.1:11222]
15:06:43,637 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.1.1:11222), adding to the pool.
~~~
You can used the attached client as follows.
$ make compile download run
hoge> connect 127.0.0.1
hoge> get hoge (goes to the first server)
hoge> get buzz (goes to the second server)

3. Stop (Ctrl-z) 127.0.1.1:11222 and wait this member has been dropped.

4. Access the cluster and confirm the new view with 1 member received.
~~~
15:08:08,087 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=3) containing 1 addresses: [/127.0.0.1:11222]
15:08:08,088 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.1.1:11222), removing from the pool.
~~~

5. Restart (type "fg") 127.0.1.1:11222 and wait for merging.

6. Access to the cluster from the client but no views are received.


Expected results:
The new merged view should be received at step 6.


Additional info:
With the following log setting, you can observe which member is reached by a client.
~~~
            <console-handler name="CONSOLE">
                <level name="TRACE"/>
                ...
            <logger category="org.infinispan.interceptors.CallInterceptor">
                <level name="TRACE"/>
            </logger>
~~~
Comment 2 Osamu Nagano 2015-10-22 03:11 EDT
Created attachment 1085419 [details]
hotrodclient.zip
Comment 3 Galder Zamarreño 2015-10-27 05:57:51 EDT
I have tried this case with community Infinispan 8.1.0.Alpha1 and the issue is not there, so probably this was already fixed before? Have you tried with latest JDG version? :|

10:54:37,805 INFO  [com.example.HotRodClient] (main) connect called: + serverList
10:54:38,026 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=5) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:54:38,027 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
10:54:38,029 INFO  [org.infinispan.client.hotrod.RemoteCacheManager] (main) ISPN004021: Infinispan version: 8.1.0.Alpha1
10:54:38,029 INFO  [com.example.HotRodClient] (main) Connected.
10:54:38,111 INFO  [com.example.HotRodClient] (main) Selected cache:
hoge> get hoge
null
hoge> get buzz
null
hoge> get hoge
10:55:44,306 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=6) containing 1 addresses: [/127.0.0.1:11222]
10:55:44,307 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.0.1:12222), removing from the pool.
null
hoge> get buzz
null
hoge> get hoge
10:56:10,619 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=8) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:56:10,620 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
null
hoge> get buzz
null
hoge>
Comment 4 Galder Zamarreño 2015-10-27 08:39:50 EDT
I've re-run the test and I've been able to replicate it. I mixed up suspend and kill commands.
Comment 5 JBoss JIRA Server 2015-10-27 09:38:24 EDT
Galder Zamarreño <galder.zamarreno@redhat.com> updated the status of jira ISPN-5889 to Coding In Progress
Comment 6 Osamu Nagano 2015-10-27 21:41:46 EDT
@Galder, I've tested with JDG 6.4.1, JDG 6.5.1, and Infinispan 8.1.0.Alpha2 and all have the same behaviour.  Ctrl-z, not killing, is important to imitate a long GC pause.

This issue results in data inconsistency.  For example, a client which connects to the first server always receives 1-member view after the merge.  Any put operations, including a key which was directed to the second server originally, are directed to the first server.  While a client which connects to the second server receives 2-member view after the merge.  This client cannot read a value  of the key put by the former client.
Comment 8 Osamu Nagano 2015-11-12 20:20:41 EST
PR #3798 has been merged to the infinispan:master.  I built and tested it but the issue in the description still remains.  Are there more work on the issue?
Comment 12 JBoss JIRA Server 2015-11-26 09:00:01 EST
Dan Berindei <dberinde@redhat.com> updated the status of jira ISPN-5889 to Reopened
Comment 14 Dan Berindei 2015-11-27 13:38:57 EST
PR: https://github.com/infinispan/jdg/pull/805

I've added a test method that does an "overlapping" merge. I've also tested with Ctrl+Z, and the client receives the 2nd node's address when it is resumed.
Comment 16 Matej Čimbora 2015-12-03 05:32:05 EST
Tested using the provided application and reproduced the issue with ER2. The problem is no longer present in ER3. Marking as verified.

Note You need to log in before you can comment on or make changes to this bug.