Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1048695

Summary: Add eviction to indexer-config.xml
Product: [JBoss] JBoss Enterprise Portal Platform 6 Reporter: Toshiya Kobayashi <tkobayas>
Component: PortalAssignee: Lucas Ponce <lponce>
Status: CLOSED DEFERRED QA Contact: Tomas Kyjovsky <tkyjovsk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: bdawidow, epp-bugs, jpallich, lponce, ppalaga, theute, tkyjovsk
Target Milestone: DR01   
Target Release: 6.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A bulk import operation into JBoss Portal cluster can lead a raise of memory comsumption due overhead in unmarshall task between nodes. eXo JCR master node can reuse QPath/QPathEntry objects meanwhile slave nodes can not reuse that objects creating a large number of objects in the portal system cache. Consequence: Internal portal-system cache can be filled with large number of objects. Fix: There is a workaround to clean cache before and after bulk import operation using JMX administrative interface: MBean = { exo:portal=portal,repository=repository,workspace=portal-system,service=Cache } This operation only has to be performed in one of the nodes of the cluster, replication will propagate clean operation over other nodes. Result: Caches will be cleaned releasing memory on JBoss Portal cluster.
Story Points: ---
Clone Of: 896393 Environment:
Last Closed: 2014-07-03 08:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
imported data - 100 portals cloned from classic
none
20 biggest objects on node1 after import
none
20 biggest objects on node2 after import
none
nodes memory
none
6.2.0.ER3 node1 stacktrace
none
6.2.0.ER3 node2 stacktrace
none
6.2.0.ER3 node1 memory snapshot (after import)
none
6.2.0.ER3 node2 memory snapshot (after import)
none
imported data - 200 portal sites cloned from classic
none
nodes with increased memory
none
Nodes after applying workaround of cleaning portal-system caches. none

Description Toshiya Kobayashi 2014-01-06 07:10:28 UTC
+++ This bug was initially created as a clone of Bug #896393 +++

Description of problem:

Platform BZ for https://issues.jboss.org/browse/GTNPORTAL-2801

Steps to Reproduce:
1. Add many portals via MOP import in clustered EPP

Actual results:

Heap usage of slave node will go up and never be released (occupied by indexer cache)

Expected results:

indexer cache is evicted as configured in indexer-config.xml so no memory leak

Comment 4 Tomas Kyjovsky 2014-05-28 19:20:47 UTC
@Toshiya, accroding to GTNPORTAL-2801 it seems the eviction configuration is not relevant anymore. How should I proceed with this BZ?

I tried to reproduce the leak with 2-node cluster of 6.2.0.ER2 using 100 portals (cloned from the classic portal) however I can't say if it's there.

Before import:
- both node1 and node2 ~500 / 1300 MB

After import via node1 and browsing through several imported portals pages on both nodes:
- node1: 500-800 / 1300 MB
- node2: 900-1000 / 1300 MB
(the lower values are after GC)

Comment 5 Tomas Kyjovsky 2014-05-28 19:22:12 UTC
Created attachment 900108 [details]
imported data - 100 portals cloned from classic

Comment 6 Tomas Kyjovsky 2014-05-28 19:23:24 UTC
Created attachment 900109 [details]
20 biggest objects on node1 after import

Comment 7 Tomas Kyjovsky 2014-05-28 19:23:54 UTC
Created attachment 900110 [details]
20 biggest objects on node2 after import

Comment 8 Toshiya Kobayashi 2014-05-29 01:15:08 UTC
Hi Tomas,

According to GTNPORTAL-2801, we don't need to change indexer-config.xml. Instead, the issue should have been fixed with the exo-JCR JIRA.

https://jira.exoplatform.org/browse/JCR-2271

However, I wonder about your test result. Assuming node 2 is the slave node, the result tells that the slave node retains much larger cache so the issue is still there "Heap usage of slave node will go up and never be released". Am I wrong? If you add more portals, the slave node would face OOME.

Peter,

Is it an expected result even after applying JCR-2271?

Regards,
Toshiya

Comment 9 Tomas Kyjovsky 2014-06-24 12:17:20 UTC
Toshiya, indeed I was able to trigger OOM on the node2 after I imported 200 portal sites on node1. Tested on 6.2.0.ER3.

Comment 10 Tomas Kyjovsky 2014-06-24 15:50:35 UTC
Created attachment 911797 [details]
nodes memory

I did some more testing with 6.2.0.ER3.

First I tested import of 200 portal sites to a single-node out-of-the-box installation. The import took about 5 minutes and was successfull. After manually triggered GC the heap usage was ~700 MB (out of 1300MB limit).

Second I tried with the 2-node cluster (with the JCR cluster settings enabled).
- The import took about 15 minutes.
- After ~5 minutes node2 exceeded GC overhead limit. About that time node1 started reporting JGroups TimeoutExceptions.
- After manually triggered GC heap usage on node1 was ~950 MB, node2 was just under the 1300MB limit.

So it seems to me there is definatelly something wrong here. Should I create a new BZ for this and close this one, or can we continue with the issue in this BZ?

Comment 11 Tomas Kyjovsky 2014-06-24 16:01:00 UTC
Created attachment 911800 [details]
6.2.0.ER3 node1 stacktrace

Comment 12 Tomas Kyjovsky 2014-06-24 16:01:21 UTC
Created attachment 911801 [details]
6.2.0.ER3 node2 stacktrace

Comment 13 Tomas Kyjovsky 2014-06-24 16:01:58 UTC
Created attachment 911802 [details]
6.2.0.ER3 node1 memory snapshot (after import)

Comment 14 Tomas Kyjovsky 2014-06-24 16:02:20 UTC
Created attachment 911804 [details]
6.2.0.ER3 node2 memory snapshot (after import)

Comment 15 Tomas Kyjovsky 2014-06-25 13:03:10 UTC
Created attachment 912090 [details]
imported data - 200 portal sites cloned from classic

Steps to reproduce the issue:
1) setup a 2-node cluster
  - especially enable the JCR cluster options in "configuration/gatein/configuration.properties"
2) start the H2 database and both nodes
3) import the data (via the Sites Redirect portlet, or gatein-management REST/CLI)

Note: The heap stats can be monitored by jvisualvm which is shipped with Oracle JDK 1.7.

Comment 16 Tomas Kyjovsky 2014-06-25 14:08:44 UTC
Adding info requested by Lucas:

JAVA_OPTS: -server -XX:+UseCompressedOops -verbose:gc -Xloggc:"/home/tkyjovsk/workspace/portal6/installs/jboss-portal-6.2/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true

(same for both nodes)

The index is shared: gatein.jcr.index.changefilterclass=org.exoplatform.services.jcr.impl.core.query.ispn.ISPNIndexChangesFilter

Import policy: "merge", but I got a very similar result for "overwrite" (only 1 site gets overwtitten, the other 200 are newly created same as with "merge" mode)

JDK: OpenJDK 1.6 (but I just retested with Oracle JDK 1.7 with the same result)

Comment 17 Lucas Ponce 2014-06-25 15:36:39 UTC
I've performed a test increasing in 512 Mb the heap using:

JAVA_OPTS="-Xms1815m -Xmx1815m -XX:MaxPermSize=256m -Xss512k -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv6Addresses=false"


and test passed.

Comment 19 Tomas Kyjovsky 2014-06-25 17:22:40 UTC
Created attachment 912190 [details]
nodes with increased memory

I retested with increased memory (+512m / 1815m):

- node1: During the import operation the heap usage on node1 _decreased_ from 950 to 700 MB (index eviction?). This didn't happen with the default heapsize settings.

- node2: Heap usage was just under the limit. The GC overhead limit wasn't exceeded however GC frequency was very high (and increasing) at the end of the import operation. (see the attachment)

Comment 22 Lucas Ponce 2014-06-30 10:39:46 UTC
Created attachment 913300 [details]
Nodes after applying workaround of cleaning portal-system caches.

Comment 23 Lucas Ponce 2014-06-30 10:42:51 UTC
Pointed by Nicolas:

"One possible workaround for this special operation could be to clear the JCR cache of portal-system before and after the import"

This operation can be performed via JMX under MBean:

exo:portal=portal,repository=repository,workspace=portal-system,service=Cache

Be careful of two points:

- Operation only has to be performed in one node, it will be propragated rest of nodes.

- It can be an operation of several seconds, so, don't start a second clean operation before first one is finished.

Comment 24 Lucas Ponce 2014-07-03 08:38:22 UTC
Deferred for current version as there is an operational workaround for this scenario.

A eXo JIRA has been filled in eXo JCR component for future versions:

https://jira.exoplatform.org/browse/JCR-2316

Comment 25 Tomas Kyjovsky 2014-08-13 18:17:18 UTC
I think this should be included in the Release Notes as a known issue.

Comment 26 Peter Palaga 2014-10-20 11:42:39 UTC
I hope the needinfo sent to me was solved by others.