Bug 1048695
| Summary: | Add eviction to indexer-config.xml | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Portal Platform 6 | Reporter: | Toshiya Kobayashi <tkobayas> |
| Component: | Portal | Assignee: | Lucas Ponce <lponce> |
| Status: | CLOSED DEFERRED | QA Contact: | Tomas Kyjovsky <tkyjovsk> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1.0 | CC: | bdawidow, epp-bugs, jpallich, lponce, ppalaga, theute, tkyjovsk |
| Target Milestone: | DR01 | ||
| Target Release: | 6.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
A bulk import operation into JBoss Portal cluster can lead a raise of memory comsumption due overhead in unmarshall task between nodes.
eXo JCR master node can reuse QPath/QPathEntry objects meanwhile slave nodes can not reuse that objects creating a large number of objects in the portal system cache.
Consequence:
Internal portal-system cache can be filled with large number of objects.
Fix:
There is a workaround to clean cache before and after bulk import operation using JMX administrative interface:
MBean = { exo:portal=portal,repository=repository,workspace=portal-system,service=Cache }
This operation only has to be performed in one of the nodes of the cluster, replication will propagate clean operation over other nodes.
Result:
Caches will be cleaned releasing memory on JBoss Portal cluster.
|
Story Points: | --- |
| Clone Of: | 896393 | Environment: | |
| Last Closed: | 2014-07-03 08:38:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Toshiya Kobayashi
2014-01-06 07:10:28 UTC
@Toshiya, accroding to GTNPORTAL-2801 it seems the eviction configuration is not relevant anymore. How should I proceed with this BZ? I tried to reproduce the leak with 2-node cluster of 6.2.0.ER2 using 100 portals (cloned from the classic portal) however I can't say if it's there. Before import: - both node1 and node2 ~500 / 1300 MB After import via node1 and browsing through several imported portals pages on both nodes: - node1: 500-800 / 1300 MB - node2: 900-1000 / 1300 MB (the lower values are after GC) Created attachment 900108 [details]
imported data - 100 portals cloned from classic
Created attachment 900109 [details]
20 biggest objects on node1 after import
Created attachment 900110 [details]
20 biggest objects on node2 after import
Hi Tomas, According to GTNPORTAL-2801, we don't need to change indexer-config.xml. Instead, the issue should have been fixed with the exo-JCR JIRA. https://jira.exoplatform.org/browse/JCR-2271 However, I wonder about your test result. Assuming node 2 is the slave node, the result tells that the slave node retains much larger cache so the issue is still there "Heap usage of slave node will go up and never be released". Am I wrong? If you add more portals, the slave node would face OOME. Peter, Is it an expected result even after applying JCR-2271? Regards, Toshiya Toshiya, indeed I was able to trigger OOM on the node2 after I imported 200 portal sites on node1. Tested on 6.2.0.ER3. Created attachment 911797 [details]
nodes memory
I did some more testing with 6.2.0.ER3.
First I tested import of 200 portal sites to a single-node out-of-the-box installation. The import took about 5 minutes and was successfull. After manually triggered GC the heap usage was ~700 MB (out of 1300MB limit).
Second I tried with the 2-node cluster (with the JCR cluster settings enabled).
- The import took about 15 minutes.
- After ~5 minutes node2 exceeded GC overhead limit. About that time node1 started reporting JGroups TimeoutExceptions.
- After manually triggered GC heap usage on node1 was ~950 MB, node2 was just under the 1300MB limit.
So it seems to me there is definatelly something wrong here. Should I create a new BZ for this and close this one, or can we continue with the issue in this BZ?
Created attachment 911800 [details]
6.2.0.ER3 node1 stacktrace
Created attachment 911801 [details]
6.2.0.ER3 node2 stacktrace
Created attachment 911802 [details]
6.2.0.ER3 node1 memory snapshot (after import)
Created attachment 911804 [details]
6.2.0.ER3 node2 memory snapshot (after import)
Created attachment 912090 [details]
imported data - 200 portal sites cloned from classic
Steps to reproduce the issue:
1) setup a 2-node cluster
- especially enable the JCR cluster options in "configuration/gatein/configuration.properties"
2) start the H2 database and both nodes
3) import the data (via the Sites Redirect portlet, or gatein-management REST/CLI)
Note: The heap stats can be monitored by jvisualvm which is shipped with Oracle JDK 1.7.
Adding info requested by Lucas: JAVA_OPTS: -server -XX:+UseCompressedOops -verbose:gc -Xloggc:"/home/tkyjovsk/workspace/portal6/installs/jboss-portal-6.2/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true (same for both nodes) The index is shared: gatein.jcr.index.changefilterclass=org.exoplatform.services.jcr.impl.core.query.ispn.ISPNIndexChangesFilter Import policy: "merge", but I got a very similar result for "overwrite" (only 1 site gets overwtitten, the other 200 are newly created same as with "merge" mode) JDK: OpenJDK 1.6 (but I just retested with Oracle JDK 1.7 with the same result) I've performed a test increasing in 512 Mb the heap using: JAVA_OPTS="-Xms1815m -Xmx1815m -XX:MaxPermSize=256m -Xss512k -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv6Addresses=false" and test passed. Created attachment 912190 [details]
nodes with increased memory
I retested with increased memory (+512m / 1815m):
- node1: During the import operation the heap usage on node1 _decreased_ from 950 to 700 MB (index eviction?). This didn't happen with the default heapsize settings.
- node2: Heap usage was just under the limit. The GC overhead limit wasn't exceeded however GC frequency was very high (and increasing) at the end of the import operation. (see the attachment)
Created attachment 913300 [details]
Nodes after applying workaround of cleaning portal-system caches.
Pointed by Nicolas: "One possible workaround for this special operation could be to clear the JCR cache of portal-system before and after the import" This operation can be performed via JMX under MBean: exo:portal=portal,repository=repository,workspace=portal-system,service=Cache Be careful of two points: - Operation only has to be performed in one node, it will be propragated rest of nodes. - It can be an operation of several seconds, so, don't start a second clean operation before first one is finished. Deferred for current version as there is an operational workaround for this scenario. A eXo JIRA has been filled in eXo JCR component for future versions: https://jira.exoplatform.org/browse/JCR-2316 I think this should be included in the Release Notes as a known issue. I hope the needinfo sent to me was solved by others. |