Bug 1158451 - Servers are not forming a cluster on solaris when the JDG instance is bound to localhost
Summary: Servers are not forming a cluster on solaris when the JDG instance is bound t...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Server
Version: 6.4.0,6.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.4.0
Assignee: Bela Ban
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-29 12:23 UTC by Jakub Markos
Modified: 2025-02-10 03:43 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2025-02-10 03:43:25 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Server logs (1.91 MB, application/zip)
2014-11-07 11:46 UTC, Jakub Markos
no flags Details
Server config (14.47 KB, application/xml)
2014-11-07 11:46 UTC, Jakub Markos
no flags Details

Description Jakub Markos 2014-10-29 12:23:07 UTC
There is no cluster forming between multiple servers on solaris, using clustered.sh (2nd server has port offset 100).

Comment 3 Martin Gencur 2014-11-05 14:58:18 UTC
Jakub, we need more information? Any server logs? what's the configuration? Any link to jenkins job where it fails? Thanks

Comment 4 Jakub Markos 2014-11-07 11:46:32 UTC
Created attachment 954906 [details]
Server logs

Attached server logs, from ER1 (er11.log, er12.logs) where the servers cluster properly, and from ER3 (er31.log, er32.log) where they do not (there were problems with logging in ER2, that's why I used ER3). 
The main thing that changed between the builds is jgroups version from 
3.4.5.Final-redhat-2 to 3.5.1.Final-redhat-1 between ER1 and ER2 and to
3.6.0.Final-redhat-1 in ER3. 
The configuration clustered.xml didn't change, so it probably needs some adjustments. I can try to find what needs changing, but Bela would probably be a faster choice.

Comment 5 Jakub Markos 2014-11-07 11:46:58 UTC
Created attachment 954907 [details]
Server config

Comment 6 Jakub Markos 2014-11-10 13:04:07 UTC
When using -Djboss.bind.address=<not localhost>, the servers cluster properly.

Comment 7 Jakub Markos 2014-11-11 11:55:48 UTC
@gsheldon The workaround is to use a different bind address than localhost.

Comment 8 Martin Gencur 2014-11-11 12:15:48 UTC
Gemma, 
I changed the title and release notes text. Please review. Thanks

Comment 9 Bela Ban 2014-11-13 15:47:10 UTC
I suggest try this out with a standalone JGroups program (e.g. ChatDemo) and the *same configuration* as in JDG.
Or give me access to a Solaris box and I can try this out myself.

Comment 10 Bela Ban 2014-11-14 15:08:34 UTC
I had issues editing the config files on the Solaris box dev32-01, so I cannot test what I wrote below.
The most likely cause is that there is no multicast route in the routing table. 

The default route points to net0:
default              10.16.95.254         UG        3  166558737 net0

If a node doesn't bind to a 10.16.x.x address, then it won't receive the multicasts sent via net0.

Solutions:

1: Add a multicast route for a given range of mcast addresses to the routing table

2: Bind to a 10.16.x.x. address (bind_addr) instead of to 127.0.0.1

3: Join all multicast routes by using either bind_interfaces="net0,lo0" or bind_to_all_interfaces="true" in UDP.

Again, I haven't been able to confirm this, so please verify it.

Comment 12 Tomas Sykora 2015-01-07 16:40:53 UTC
This is running ok now with ER8.
Also, note that the ticked was resolved.

Comment 13 Jakub Markos 2015-01-08 09:42:21 UTC
ER8 probably passed because the workaround that I applied for ER7 is still in place, so I'm reopening this.

Comment 15 Red Hat Bugzilla 2025-02-10 03:43:25 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.


Note You need to log in before you can comment on or make changes to this bug.