Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1085927

Summary: org.jgroups.TimeoutException when starting two nodes in cluster
Product: [JBoss] JBoss Enterprise Portal Platform 6 Reporter: vramik
Component: PortalAssignee: Lucas Ponce <lponce>
Status: CLOSED UPSTREAM QA Contact: vramik
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2.0CC: epp-bugs, ppalaga, tkyjovsk
Target Milestone: ER02   
Target Release: 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
It was discovered that some variables in jgroups and infinispan (ack_on_delivery and clusterName) were not properly defined. The improperly defined variable were causing TimeoutException errors in a clustered environment. JGroups and Infinispan configurations have been updated to define ack_on_delivery and clustername variables correctly, which fixes the TimeoutException errors originally encountered.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-10 03:35:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
perf13.log
none
perf13_01.log
none
perf14_pages
none
No repeated pages screenshot none

Description vramik 2014-04-09 16:24:59 UTC
Created attachment 884568 [details]
perf13.log

Description of problem:
There is TimeoutException when starting portal in cluster mode. Follow steps to reproduce.

Version-Release number of selected component (if applicable):
rhjp6.2.dr02

Steps to Reproduce:
1. I've used two machines in lab (perf13, perf14)
2. start h2 db on both machines: java -cp modules/system/layers/base/com/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Server
3. start portal (on perf13): sh standalone.sh -b perf13.mw.lab.eng.bos.redhat.com -c standalone-ha.xml -Djboss.node.name=perf13
4. start portal (on perf14): sh standalone.sh -b perf14.mw.lab.eng.bos.redhat.com -c standalone-ha.xml -Djboss.node.name=perf14
5. There are exceptions in log on perf13 when perf14 is being started (see attached perf13.log)
6. When I've tried to do a quick sanity check on perf13, I got another exceptions (perf13_01.log)

Additional info:
There are additional pages on perf14O (see perf14_pages.png)

Comment 1 vramik 2014-04-09 16:25:32 UTC
Created attachment 884569 [details]
perf13_01.log

Comment 2 vramik 2014-04-09 16:25:59 UTC
Created attachment 884570 [details]
perf14_pages

Comment 4 Lucas Ponce 2014-04-22 09:44:02 UTC
Hi,

I have one important doubt about steps that can affect to the case:

"2. start h2 db on both machines: java -cp modules/system/layers/base/com/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Server"

In a cluster, environment, database should be unique, having two different databases at same time can create some unexpected behaviour.

Please, could you try to reproduce the steps but using a shared database for both nodes.

It's probably that issue will remain, but then we can have a more close trace of the issue.

Meanwhile I'm going to setup a similar environment to reproduce it.

Thanks,
Lucas

Comment 5 Lucas Ponce 2014-04-22 11:38:04 UTC
I confirm I could reproduce issue with a single h2 database for both nodes.

Investigating.

Comment 7 Lucas Ponce 2014-04-22 12:34:43 UTC
Issue found:

- clusterName="" is not set up properly due ${infinispan-cluster-name} system variable is not defined.

Workaround:

- Start ha configuration with this proper variable, for example:

 bin/standalone.sh -b node1 -c standalone-ha.xml -Djboss.node.name=node1 -Dinfinispan-cluster-name=gatein-cluster

bin/standalone.sh -b node2 -c standalone-ha.xml -Djboss.node.name=node2 -Dinfinispan-cluster-name=gatein-portal


I'm going to prepare a fix to define this property by default.

Comment 8 Lucas Ponce 2014-04-22 15:00:14 UTC
Another issue found:

- RSVP.ack_on_delivery=true on JGroups configuration where Infinispan recommend set to false to avoid deadlocks.

Some preliminar tests seems this also can be related in the overall issue.

[1] https://issues.jboss.org/browse/ISPN-2612
[2] https://issues.jboss.org/browse/ISPN-2713

Comment 9 Tomas Kyjovsky 2014-04-22 16:40:12 UTC
I tried applying the workaround suggested by Dan Berindei [1] and it removed the TimeoutException. However the redundant pages (that shouldn't be visible) are still there, even when I added the "infinispan-cluster-name" property.


Additional Steps:

- set RSVP.ack_on_delivery=false in gatein/gatein.ear/portal.war/WEB-INF/classes/jgroups/gatein-udp.xml as suggested in [1] for both nodes
   (all the other gatein jgroups configs already have this set)

- set -Dinfinispan-cluster-name=gatein-portal for both nodes


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1087244#c2

Comment 10 Lucas Ponce 2014-04-22 17:11:04 UTC
Created attachment 888599 [details]
No repeated pages screenshot

I've sent a PR for master in

https://github.com/gatein/gatein-portal/pull/832

With this fix and a clean database I can't reproduce repeated pages.

Please, could you repeat the test with a clean database ?

May be there is an issue with initial database, this can help us to scope it.

Thanks,
Lucas

Comment 11 Tomas Kyjovsky 2014-04-23 17:02:25 UTC
Yes, it was the db. The additional pages were caused by data from portal 6.1 which we used for comparison. I don't see them with clean db.

(Standalone h2 stores the dbs in ${user.home} instead of /data and I forgot to clean them between the tests.)

Comment 12 Peter Palaga 2014-04-23 19:40:50 UTC
https://github.com/gatein/gatein-portal/pull/832 was merged in upstream.

Comment 13 vramik 2014-05-13 21:50:53 UTC
Verified in ER02

Comment 15 Red Hat Bugzilla 2025-02-10 03:35:35 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.