Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1085927

Summary:

org.jgroups.TimeoutException when starting two nodes in cluster

Product:

[JBoss] JBoss Enterprise Portal Platform 6

Reporter:

vramik

Component:

Portal

Assignee:

Lucas Ponce <lponce>

Status:

CLOSED UPSTREAM

QA Contact:

vramik

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

6.2.0

CC:

epp-bugs, ppalaga, tkyjovsk

Target Milestone:

ER02

Target Release:

6.2.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

It was discovered that some variables in jgroups and infinispan (ack_on_delivery and clusterName) were not properly defined. The improperly defined variable were causing TimeoutException errors in a clustered environment. JGroups and Infinispan configurations have been updated to define ack_on_delivery and clustername variables correctly, which fixes the TimeoutException errors originally encountered.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2025-02-10 03:35:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
perf13.log	none
perf13_01.log	none
perf14_pages	none
No repeated pages screenshot	none

Description vramik 2014-04-09 16:24:59 UTC

Created attachment 884568 [details]
perf13.log

Description of problem:
There is TimeoutException when starting portal in cluster mode. Follow steps to reproduce.

Version-Release number of selected component (if applicable):
rhjp6.2.dr02

Steps to Reproduce:
1. I've used two machines in lab (perf13, perf14)
2. start h2 db on both machines: java -cp modules/system/layers/base/com/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Server
3. start portal (on perf13): sh standalone.sh -b perf13.mw.lab.eng.bos.redhat.com -c standalone-ha.xml -Djboss.node.name=perf13
4. start portal (on perf14): sh standalone.sh -b perf14.mw.lab.eng.bos.redhat.com -c standalone-ha.xml -Djboss.node.name=perf14
5. There are exceptions in log on perf13 when perf14 is being started (see attached perf13.log)
6. When I've tried to do a quick sanity check on perf13, I got another exceptions (perf13_01.log)

Additional info:
There are additional pages on perf14O (see perf14_pages.png)

Comment 1 vramik 2014-04-09 16:25:32 UTC

Created attachment 884569 [details]
perf13_01.log

Comment 2 vramik 2014-04-09 16:25:59 UTC

Created attachment 884570 [details]
perf14_pages

Comment 4 Lucas Ponce 2014-04-22 09:44:02 UTC

Hi,

I have one important doubt about steps that can affect to the case:

"2. start h2 db on both machines: java -cp modules/system/layers/base/com/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Server"

In a cluster, environment, database should be unique, having two different databases at same time can create some unexpected behaviour.

Please, could you try to reproduce the steps but using a shared database for both nodes.

It's probably that issue will remain, but then we can have a more close trace of the issue.

Meanwhile I'm going to setup a similar environment to reproduce it.

Thanks,
Lucas

Comment 5 Lucas Ponce 2014-04-22 11:38:04 UTC

I confirm I could reproduce issue with a single h2 database for both nodes.

Investigating.

Comment 7 Lucas Ponce 2014-04-22 12:34:43 UTC

Issue found:

- clusterName="" is not set up properly due ${infinispan-cluster-name} system variable is not defined.

Workaround:

- Start ha configuration with this proper variable, for example:

 bin/standalone.sh -b node1 -c standalone-ha.xml -Djboss.node.name=node1 -Dinfinispan-cluster-name=gatein-cluster

bin/standalone.sh -b node2 -c standalone-ha.xml -Djboss.node.name=node2 -Dinfinispan-cluster-name=gatein-portal


I'm going to prepare a fix to define this property by default.

Comment 8 Lucas Ponce 2014-04-22 15:00:14 UTC

Another issue found:

- RSVP.ack_on_delivery=true on JGroups configuration where Infinispan recommend set to false to avoid deadlocks.

Some preliminar tests seems this also can be related in the overall issue.

[1] https://issues.jboss.org/browse/ISPN-2612
[2] https://issues.jboss.org/browse/ISPN-2713

Comment 9 Tomas Kyjovsky 2014-04-22 16:40:12 UTC

I tried applying the workaround suggested by Dan Berindei [1] and it removed the TimeoutException. However the redundant pages (that shouldn't be visible) are still there, even when I added the "infinispan-cluster-name" property.


Additional Steps:

- set RSVP.ack_on_delivery=false in gatein/gatein.ear/portal.war/WEB-INF/classes/jgroups/gatein-udp.xml as suggested in [1] for both nodes
   (all the other gatein jgroups configs already have this set)

- set -Dinfinispan-cluster-name=gatein-portal for both nodes


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1087244#c2

Comment 10 Lucas Ponce 2014-04-22 17:11:04 UTC

Created attachment 888599 [details]
No repeated pages screenshot

I've sent a PR for master in

https://github.com/gatein/gatein-portal/pull/832

With this fix and a clean database I can't reproduce repeated pages.

Please, could you repeat the test with a clean database ?

May be there is an issue with initial database, this can help us to scope it.

Thanks,
Lucas

Comment 11 Tomas Kyjovsky 2014-04-23 17:02:25 UTC

Yes, it was the db. The additional pages were caused by data from portal 6.1 which we used for comparison. I don't see them with clean db.

(Standalone h2 stores the dbs in ${user.home} instead of /data and I forgot to clean them between the tests.)

Comment 12 Peter Palaga 2014-04-23 19:40:50 UTC

https://github.com/gatein/gatein-portal/pull/832 was merged in upstream.

Comment 13 vramik 2014-05-13 21:50:53 UTC

Verified in ER02

Comment 15 Red Hat Bugzilla 2025-02-10 03:35:35 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.