1274155 – Client keeps using old view after merging of split brains

Bug 1274155 - Client keeps using old view after merging of split brains

Summary: Client keeps using old view after merging of split brains

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	6.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ER3
Target Release:	6.6.0
Assignee:	Tristan Tarrant
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1288354
TreeView+	depends on / blocked

Reported:	2015-10-22 07:10 UTC by Osamu Nagano
Modified:	2025-02-10 03:48 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Clones:	1288354 (view as bug list)
Environment:
Last Closed:	2025-02-10 03:48:22 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
hotrodclient.zip (6.28 KB, application/zip) 2015-10-22 07:11 UTC, Osamu Nagano	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	ISPN-5889	0	Major	Resolved	Merge views not dealt with in Hot Rod server	2018-04-10 12:15:58 UTC

Description Osamu Nagano 2015-10-22 07:10:03 UTC

Description of problem:
After merging of split-brains, a client doesn't recognize the new cluster view and keeps using an old view.


Version-Release number of selected component (if applicable):
JDG 6.4.1 server and Java client (JDG 6.5.1 as well)


How reproducible:
Always


Steps to Reproduce:
1. Start 2 nodes cluster, 127.0.0.1:11222 and 127.0.1.1:11222 in this example.

2. Connect to 127.0.0.1:11222 and confirm that the current view has 2 members.
~~~
15:06:43,636 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=2) containing 2 addresses: [/127.0.1.1:11222, /127.0.0.1:11222]
15:06:43,637 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.1.1:11222), adding to the pool.
~~~
You can used the attached client as follows.
$ make compile download run
hoge> connect 127.0.0.1
hoge> get hoge (goes to the first server)
hoge> get buzz (goes to the second server)

3. Stop (Ctrl-z) 127.0.1.1:11222 and wait this member has been dropped.

4. Access the cluster and confirm the new view with 1 member received.
~~~
15:08:08,087 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=3) containing 1 addresses: [/127.0.0.1:11222]
15:08:08,088 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.1.1:11222), removing from the pool.
~~~

5. Restart (type "fg") 127.0.1.1:11222 and wait for merging.

6. Access to the cluster from the client but no views are received.


Expected results:
The new merged view should be received at step 6.


Additional info:
With the following log setting, you can observe which member is reached by a client.
~~~
            <console-handler name="CONSOLE">
                <level name="TRACE"/>
                ...
            <logger category="org.infinispan.interceptors.CallInterceptor">
                <level name="TRACE"/>
            </logger>
~~~

Comment 2 Osamu Nagano 2015-10-22 07:11:50 UTC

Created attachment 1085419 [details]
hotrodclient.zip

Comment 3 Galder Zamarreño 2015-10-27 09:57:51 UTC

I have tried this case with community Infinispan 8.1.0.Alpha1 and the issue is not there, so probably this was already fixed before? Have you tried with latest JDG version? :|

10:54:37,805 INFO  [com.example.HotRodClient] (main) connect called: + serverList
10:54:38,026 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=5) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:54:38,027 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
10:54:38,029 INFO  [org.infinispan.client.hotrod.RemoteCacheManager] (main) ISPN004021: Infinispan version: 8.1.0.Alpha1
10:54:38,029 INFO  [com.example.HotRodClient] (main) Connected.
10:54:38,111 INFO  [com.example.HotRodClient] (main) Selected cache:
hoge> get hoge
null
hoge> get buzz
null
hoge> get hoge
10:55:44,306 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=6) containing 1 addresses: [/127.0.0.1:11222]
10:55:44,307 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004016: Server not in cluster anymore(/127.0.0.1:12222), removing from the pool.
null
hoge> get buzz
null
hoge> get hoge
10:56:10,619 INFO  [org.infinispan.client.hotrod.impl.protocol.Codec21] (main) ISPN004006: /127.0.0.1:11222 sent new topology view (id=8) containing 2 addresses: [/127.0.0.1:11222, /127.0.0.1:12222]
10:56:10,620 INFO  [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (main) ISPN004014: New server added(/127.0.0.1:12222), adding to the pool.
null
hoge> get buzz
null
hoge>

Comment 4 Galder Zamarreño 2015-10-27 12:39:50 UTC

I've re-run the test and I've been able to replicate it. I mixed up suspend and kill commands.

Comment 5 JBoss JIRA Server 2015-10-27 13:38:24 UTC

Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-5889 to Coding In Progress

Comment 6 Osamu Nagano 2015-10-28 01:41:46 UTC

@Galder, I've tested with JDG 6.4.1, JDG 6.5.1, and Infinispan 8.1.0.Alpha2 and all have the same behaviour.  Ctrl-z, not killing, is important to imitate a long GC pause.

This issue results in data inconsistency.  For example, a client which connects to the first server always receives 1-member view after the merge.  Any put operations, including a key which was directed to the second server originally, are directed to the first server.  While a client which connects to the second server receives 2-member view after the merge.  This client cannot read a value  of the key put by the former client.

Comment 8 Osamu Nagano 2015-11-13 01:20:41 UTC

PR #3798 has been merged to the infinispan:master.  I built and tested it but the issue in the description still remains.  Are there more work on the issue?

Comment 12 JBoss JIRA Server 2015-11-26 14:00:01 UTC

Dan Berindei <dberinde> updated the status of jira ISPN-5889 to Reopened

Comment 14 Dan Berindei 2015-11-27 18:38:57 UTC

PR: https://github.com/infinispan/jdg/pull/805

I've added a test method that does an "overlapping" merge. I've also tested with Ctrl+Z, and the client receives the 2nd node's address when it is resumed.

Comment 16 Matej Čimbora 2015-12-03 10:32:05 UTC

Tested using the provided application and reproduced the issue with ER2. The problem is no longer present in ER3. Marking as verified.

Comment 21 Red Hat Bugzilla 2025-02-10 03:48:22 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.

Note You need to log in before you can comment on or make changes to this bug.