EAP's JChannelFactory tries to set a custom socket factory on the JGroups transport. This is not the correct API to use, and it gets overwritten when the JGroups channel starts. A custom socket factory should be set on the JChannel. The only time the custom socket factory is currently used is if there's a race condition where two channels are started at the same time, and the custom factory is set just before the other channel uses it.
This can be fixed either by just removing the custom socket factory (which is not currently used anyways except if the race condition triggers), or by setting it in the correct place. If it's fixed by setting it in the correct place, BZ 1268186 is also required.
PR sent upstream (still not merged) https://github.com/wildfly/wildfly-core/pull/1145 https://github.com/wildfly/wildfly/pull/8227
update in this issue: PR in wildfly core (upstream) is already merged https://github.com/wildfly/wildfly-core/pull/1145 wildfy issue was not merged but closed: https://github.com/wildfly/wildfly/pull/8227
Upstream PR related: https://github.com/wildfly/wildfly/pull/8297 https://github.com/wildfly/wildfly-core/pull/1145 https://github.com/wildfly/wildfly-core/pull/1181
Hi Paul, could you clarify BZ 1268186, does it need to be fixed ?
This commit will be reverted in upstream once wildfly-core containing the fixes is upgraded in wildfly. https://github.com/wildfly/wildfly/commit/36f5bd99c8893a75ae0338708a5bd41263676060
https://github.com/wildfly/wildfly/pull/8297 also fixes BZ 1268186.
(In reply to dereed from comment #11) > https://github.com/wildfly/wildfly/pull/8297 also fixes BZ 1268186. More specifically, the calls where BZ 1268186 was being triggered in this use case are removed in PR 8297.
*** Bug 1268186 has been marked as a duplicate of this bug. ***
https://github.com/jbossas/jboss-eap/pull/2599 (backport of the 3 PRs listed in #7)
Using the default UDP stack and the default TCP stack in the the default standalone-ha.xml (with a single trivial change in the TCP case), I've verified that with this fix, a majority of sockets are created via the ManagedSocketFactory. However, the FD_SOCK protocol still creates its sockets via the JGroups' own DefaultSocketFactory. I will attach a Byteman script that shows this. How do we want to proceed?
Created attachment 1088972 [details] Byteman script to show ManagedSocketFactory vs. DefaultSocketFactory usage
I just tried with WildFly 10.0.0.CR4 and it works perfectly there, even the FD_SOCK sockets are created via the ManagedSocketFactory. This means that the backport is wrong.
Created attachment 1089013 [details] Node1(UDP) log
Created attachment 1089014 [details] Node2(UDP) log
Created attachment 1089015 [details] Node1(TCP) log
Created attachment 1089016 [details] Node2(TCP) log
I've attached logs of both servers of a 2-node cluster (both with the default UDP stack and the default TCP stack), running with the attached Byteman script attached. I.e., they show the stacktraces. Look for the string "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! JGroups default".
The attribute shared (transport tag) has a different default value: in wildfly (is false) so it is not a singleton https://github.com/wildfly/wildfly/blob/master/clustering/jgroups/extension/src/main/java/org/jboss/as/clustering/jgroups/subsystem/TransportResourceDefinition.java#L121 in eap is true so it is a singleton (point 2 in the comment 28 is executed) https://github.com/jbossas/jboss-eap/blob/6.x/clustering/jgroups/src/main/java/org/jboss/as/clustering/jgroups/subsystem/TransportResource.java#L65 This is the reason why it is not possible to reproduce the issue upstream with default values. I'm reopening the issue upstream as well.
Enrique González Martínez <elguardian> updated the status of jira WFLY-5449 to Reopened
Dennis Reed <dereed> updated the status of jira WFLY-5449 to Resolved
I've confirmed Enrique's findings in #29. The issue with FD_SOCK's socket factory is a bug in JGroups related to singletons. It's a separate issue from this BZ -- EAP is setting the factory correctly now, and it's no longer breaking when used, but JGroups isn't using it for FD_SOCK.
Upstream for the FD_SOCK issue: https://issues.jboss.org/browse/JGRP-1974 It's a separate bug in a different component, and shouldn't block this BZ.
OK, makes sense. Verified with EAP 6.4.5.CP.CR1.
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.