Bug 794144 (JBEPP-1210)

Summary: Support for shared transport in JGroups configuration
Product: [JBoss] JBoss Enterprise Portal Platform 5 Reporter: mposolda
Component: PortalAssignee: mposolda
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.2.0.ER01CC: epp-bugs, mposolda, theute
Target Milestone: ---   
Target Release: 5.2.0.ER06   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/JBEPP-1210
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
EPP 5.2 started with: ./run.sh -c production -g portalPerffDebugger -u 239.255.14.16 -Dexo.profiles=cluster
Last Closed: 2011-10-25 14:38:04 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
probe-output.txt
none
JBEPP-1210.patch none

Description mposolda 2011-09-27 13:05:22 UTC
project_key: JBEPP

It will be good to have support for shared transport in our JGroups configuration. Current JGroups configuration in EPP 5.2 ER1 still isn't good because every service is using it's own JChannel and so that we need to have many open network connections and many JGroups threads. And using multiplexer doesn't seems to be option as it needs changes in code, which won't be applied in eXo kernel . Advantages of shared transport over multiplexer are here http://community.jboss.org/wiki/MigrationFromMultiplexerToSharedTransport .

I am attaching output of script probe.sh in attachement probe-output.txt ( Probe is part of EAP and is in $EPP_HOME/bin/probe.sh, more info http://community.jboss.org/wiki/Probe ). Here we can see that EAP clusters (SessionCache, HAPartitionCache, opt-entity, ... ) have shared transport available and they are sharing same JGroups channel . But each of our clusters (idm, MOPSessionManager, ...) is using different JChannel.

Some points from my investigation:
- There are 3 JBoss cache instances created by eXo kernel ( gatein-portal-DescriptionService, gatein-portal-NavigationService, gatein-portal-MOPSessionManager). We can see that each instance is using it's own JChannel. There is JIRA for multiplexer support https://issues.jboss.org/browse/JBEPP-779 but eXo is not going to provide support for multiplexer .

- There are 2 JBoss cache instances created by PicketlinkIDMService ( idm-api-cluster, idm-store-cluster). There is support for multiplexer but I verified that multiplexer is not working because of two issues:
a) Line "Cache cache = factory.createCache(configStream);" -- called from PicketlinkIDMServiceIMPL constructor, will immediatelly create and start JBoss cache instance created by factory. Later call of method "applyJGroupsConfig" does not have effect, because cache is already started . We should use "factory.createCache(configStream, false)" if we want to do additional config changes and start cache later. 
b) For reuse "jcr.stack" multiplexer, it needs to use same instance of JChannelFactory, which is used for JCR. This may be hard to achieve because variable CHANNEL_FACTORY is private inside org.exoplatform.services.jcr.jbosscache.ExoJBossCacheFactory (which is JChannelFactory used for create JCR stack)

- JCR is using multiplexer and so they have one instance of multiplexed JChannel (jcr.stack) which is shared among 18 JBoss cache instances needed by JCR.

I think it should be possible to configure shared transport so that all our clusters  (JCR, MOPSessionManager, NavigationService, DescriptionService, idm-api, idm-store) will use same transport channel .

Comment 1 mposolda 2011-09-27 13:06:17 UTC
BTW: When I change logging of JGroups to WARN in jboss-log4j.xml (We are using ERROR level in EPP by default ), then my server log is still full of WARN messages because all our clusters are using same multicast address and port:

4:45:38,253 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-gatein-portal-MOPSessionManager"). Sender was 127.0.0.1:45075
14:45:38,253 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-idm-store-cluster"). Sender was 127.0.0.1:45075
14:45:38,258 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-idm-api-cluster"). Sender was 127.0.0.1:45075
14:45:38,255 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-gatein-portal-DescriptionService"). Sender was 127.0.0.1:45075
14:45:38,890 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-idm-store-cluster"). Sender was 127.0.0.1:45075
14:45:38,891 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-gatein-portal-DescriptionService"). Sender was 127.0.0.1:45075
14:45:38,892 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-idm-api-cluster"). Sender was 127.0.0.1:45075
14:45:38,893 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-NavigationService" (our group is "portalPerffDebugger-gatein-portal-MOPSessionManager"). Sender was 127.0.0.1:45075
14:45:45,224 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-DescriptionService" (our group is "portalPerffDebugger-idm-store-cluster"). Sender was 127.0.0.1:56618
14:45:45,224 WARN  [UDP] discarded message from different group "portalPerffDebugger-gatein-portal-DescriptionService" (our group is "portalPerffDebugger-idm-api-cluster"). Sender was 127.0.0.1:56618


Comment 2 mposolda 2011-09-27 13:07:31 UTC
Attachment: Added: probe-output.txt


Comment 3 mposolda 2011-09-27 13:14:14 UTC
Link: Added: This issue is related to JBQA-5399


Comment 4 mposolda 2011-09-27 13:15:03 UTC
Link: Added: This issue relates to JBEPP-778


Comment 5 mposolda 2011-09-27 13:15:03 UTC
Link: Added: This issue relates to JBEPP-779


Comment 9 mposolda 2011-10-24 21:18:02 UTC
Attaching patch for setup of shared transport over all JBoss cache instances used for portal services (JCR, Idm, MOPSessionManager, DescriptionService, NavigationService), which means disabling deprecated multiplexer. It's also adding  configuration for TCP cluster, which is easily switchable with UDP by using system properties: -Djboss.default.jgroups.stack=tcp -Dgatein.default.jgroups.stack=tcp . JCR is configured to use shareable caches to avoid creation of many jboss cache instances and reduce number of needed resources.

Comment 10 mposolda 2011-10-24 21:18:02 UTC
Attachment: Added: JBEPP-1210.patch


Comment 11 mposolda 2011-10-26 08:53:12 UTC
Link: Added: This issue is related to GTNPORTAL-2234


Comment 12 mposolda 2011-10-26 10:16:32 UTC
Link: Added: This issue is related to JBQA-4181


Comment 13 mposolda 2011-10-26 10:28:40 UTC
Link: Added: This issue is related to JBEPP-736


Comment 14 mposolda 2011-10-26 10:38:44 UTC
Link: Added: This issue is related to JBEPP-778


Comment 15 Thomas Heute 2011-11-17 14:22:42 UTC
Release Notes Docs Status: Added: Not Yet Documented
Release Notes Text: Added: Cluster transportation of data has been optimized to reduce the number of network connections and threads


Comment 16 Jared MORGAN 2011-11-28 04:37:27 UTC
Release Notes Docs Status: Removed: Not Yet Documented Added: Documented as Feature Request
Primary SME: Added: theute


Comment 17 Jared MORGAN 2011-12-04 23:07:36 UTC
Release Notes Text: Removed: Cluster transportation of data has been optimized to reduce the number of network connections and threads Added: Cluster data transportation has been optimized to reduce the number of network connections and threads.