Bug 724554 (BRMS-502) - Possible performance degradation issues with ModeShape
Summary: Possible performance degradation issues with ModeShape
Keywords:
Status: ASSIGNED
Alias: BRMS-502
Product: JBoss Enterprise BRMS Platform 5
Classification: JBoss
Component: Modeshape
Version: 5.1.0.CR1
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: BRMS 5.2.0.GA
Assignee: Van Halbert
QA Contact: Lukáš Petrovický
URL: http://jira.jboss.org/jira/browse/BRM...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-07 14:30 UTC by Petr Široký
Modified: 2021-04-09 12:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Oracle11g
Last Closed: 2011-05-31 19:52:39 UTC
Type: Bug


Attachments (Terms of Use)
modeshape_performance_issue.png (151.42 KB, image/png)
2010-12-07 14:31 UTC, Petr Široký
no flags Details
jboss-brms-ds.xml (1.01 KB, text/xml)
2010-12-07 21:55 UTC, Petr Široký
no flags Details
modeshape-config.xml (15.11 KB, text/xml)
2010-12-07 21:55 UTC, Petr Široký
no flags Details
mortgage-sample-repository.xml (362.22 KB, text/xml)
2010-12-07 21:55 UTC, Petr Široký
no flags Details
modeshape-config-in-memory.xml (15.49 KB, text/xml)
2010-12-08 00:26 UTC, Randall Hauch
no flags Details
modeshape-config-no-indexes.xml (15.14 KB, text/xml)
2010-12-08 00:26 UTC, Randall Hauch
no flags Details
modeshape-config-in-memory-no-indexes.xml (5.90 KB, text/xml)
2010-12-08 00:26 UTC, Randall Hauch
no flags Details
modeshape-config-lightweight.xml (6.06 KB, text/xml)
2010-12-08 00:26 UTC, Randall Hauch
no flags Details
leak-test-modeshape-config-in-memory.png (69.57 KB, image/png)
2010-12-08 14:52 UTC, Petr Široký
no flags Details
leak-test-modeshape-config-in-memory-no-indexes.png (43.50 KB, image/png)
2010-12-08 14:52 UTC, Petr Široký
no flags Details
leak-test-modeshape-config-no-indexes-oracle11g.png (55.65 KB, image/png)
2010-12-08 14:52 UTC, Petr Široký
no flags Details
leak-test-modeshape-config-no-indexes-mysql51.png (43.95 KB, image/png)
2010-12-08 15:07 UTC, Petr Široký
no flags Details
leak-test-jackrabbit-oracle11g.png (35.13 KB, image/png)
2010-12-08 20:14 UTC, Petr Široký
no flags Details
leak-test-jackrabbit-mysql51.png (37.60 KB, image/png)
2010-12-08 20:14 UTC, Petr Široký
no flags Details
leak-test-modeshape-config-no-indexes-oracle11g-operations.png (34.95 KB, image/png)
2010-12-09 13:50 UTC, Petr Široký
no flags Details
leak-test-modeshape-config-no-indexes-mysql51-localdb-operations.png (36.84 KB, image/png)
2010-12-09 17:13 UTC, Petr Široký
no flags Details
leak-test-modeshape(2.2.x, r2625)-default-config-oracle11g.png (112.05 KB, image/png)
2010-12-22 15:07 UTC, Petr Široký
no flags Details
leak-test-modeshape(2.2.x, r2625)-default-config-mysql51.png (113.73 KB, image/png)
2010-12-22 15:07 UTC, Petr Široký
no flags Details
brms-52dev3-hsql.png (113.02 KB, image/png)
2011-07-21 10:47 UTC, Petr Široký
no flags Details
BRMS 5.2 ER1 + Jackrabbit (Leak test chart) (184.80 KB, image/png)
2011-08-01 14:27 UTC, Lukáš Petrovický
no flags Details
BRMS 5.2 ER1 + Modeshape (Leak test chart) (110.57 KB, image/png)
2011-08-01 14:30 UTC, Lukáš Petrovický
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 724652 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 780100 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Issue Tracker BRMS-502 0 Critical Closed Possible performance degradation issues with ModeShape 2019-03-12 00:11:45 UTC
Red Hat Issue Tracker SOA-2470 0 Major Closed Potential perfromance issues with ModeShape 2019-03-12 00:11:45 UTC

Internal Links: 724652 780100

Description Petr Široký 2010-12-07 14:30:40 UTC
securitylevel_name: Public

When running test (description below) with ModeShape the test takes way too much time in comparison with JackRabbit and also the time is rasing. The graph is attached.

The test is doing approximately this: 
1) Import sample repository (mortgage).
2) Log in, change one drl file, commit changes.
3) Build package.
4) Repeat steps 2 and 3 several times.

Source code of the test: https://svn.devel.redhat.com/repos/jboss-qa/guvnor/branches/BRMS-5.1.x/guvnor-leak-test/src/com/jboss/guvnor/leak/LeakTest.java

Database could be allocated here: http://dballocator.mw.lab.eng.bos.redhat.com:8080/Allocator/AllocatorServlet?operation=report

See: 
http://hudson.qa.jboss.com/hudson/view/BRMS/job/brms-leak-single/48/console -- modeshape
http://hudson.qa.jboss.com/hudson/view/BRMS/job/brms-leak-single/47/console -- jackrabbit


It could be related to SOA-2470.

Comment 1 Petr Široký 2010-12-07 14:31:10 UTC
Attachment: Added: modeshape_performance_issue.png


Comment 2 Petr Široký 2010-12-07 14:32:55 UTC
Link: Added: This issue is related to SOA-2470


Comment 3 Anne-Louise Tangring 2010-12-07 14:39:11 UTC
ModeShape is not the default repo. We will not look at this for BRMS 5.1.0


Comment 4 Lukáš Petrovický 2010-12-07 14:51:35 UTC
In that case, the documentation should mention that when Modeshape is used, the GUI will gradually become less and less responsive, to a point where it is not usable.

Comment 5 Anne-Louise Tangring 2010-12-07 15:04:50 UTC
Ok,, we'll look into it.


Comment 7 Randall Hauch 2010-12-07 21:10:25 UTC
Please attach the rule files that were imported, and preferably an System View export of the repository after Step 4 above. Also, please attach the ModeShape configuration file (specifically, I'm interested in which data source & DBMS is being used, whether the database is remote or local, etc.).

Comment 8 Randall Hauch 2010-12-07 21:20:14 UTC
Also, can you please give a brief explanation of what these tests are doing? Any specific code references would be very desirable.

Comment 9 Petr Široký 2010-12-07 21:55:29 UTC
Modeshape config, data source configuration and sample repository files attached (sorry for the inaccuracy, whole sample repository is imported, not rules files). Oracle11g is used as DBMS and it is remote (see updated description for allocating).

Comment 10 Petr Široký 2010-12-07 21:55:29 UTC
Attachment: Added: jboss-brms-ds.xml
Attachment: Added: modeshape-config.xml
Attachment: Added: mortgage-sample-repository.xml


Comment 11 Petr Široký 2010-12-07 22:06:39 UTC
Link to the source code of test added. Also I will ask Tomáš Schlosser if he can add some more information, because he is the maintainer of this test.

Comment 12 Randall Hauch 2010-12-08 00:26:02 UTC
I attached four new configuration files that set up ModeShape is several different ways that, hopefully, will help isolate the linear performance degradation.

There are three configurations that keep all functionality, but store various parts of the managed information in-memory:
- modeshape-config-in-memory.xml will store the repository content in-memory, but otherwise does everything as usual
- modeshape-config-no-indexes.xml will keep the Lucene indexes in-memory, but otherwise does everything as usual
- modeshape-config-in-memory-no-indexes.xml will store the repository content and Lucene indexes in-memory

This fourth configuration is a very lightweight process.
- modeshape-config-lightweight.xml will store the content in-memory but turns off all query functionality (I'm not sure Guvnor will work with this mode)

Any chance you could run the LeakTest against each of these four configurations? Thanks!

Comment 13 Randall Hauch 2010-12-08 00:26:02 UTC
Attachment: Added: modeshape-config-in-memory.xml
Attachment: Added: modeshape-config-no-indexes.xml
Attachment: Added: modeshape-config-in-memory-no-indexes.xml
Attachment: Added: modeshape-config-lightweight.xml


Comment 14 Lukáš Petrovický 2010-12-08 12:27:02 UTC
We're working on it.

Comment 15 Petr Široký 2010-12-08 14:52:55 UTC
Attached charts (thanks to Tomáš Schlosser) with test results for first three modeshape configurations. As Randall expected the fourth configuration (lightweight) did not work with Guvnor.

The 'in-memory' and 'in-memory-no-indexes' configurations are working fine. The time is not raising. But with 'no-indexes' configuration (with oracle11g) the time for every loop is raising (loop1=cca 35s and loop39=cca 105s).

Comment 16 Petr Široký 2010-12-08 14:52:55 UTC
Attachment: Added: leak-test-modeshape-config-in-memory.png
Attachment: Added: leak-test-modeshape-config-in-memory-no-indexes.png
Attachment: Added: leak-test-modeshape-config-no-indexes-oracle11g.png


Comment 17 Petr Široký 2010-12-08 15:07:57 UTC
Added one more chart with 'no-indexes' configuration and MySQL 5.1. The time is also raising (loop1=cca 22s loop50=cca 40s), but not so dramatically as with Oracle11g.

Comment 18 Petr Široký 2010-12-08 15:07:57 UTC
Attachment: Added: leak-test-modeshape-config-no-indexes-mysql51.png


Comment 19 Randall Hauch 2010-12-08 16:37:17 UTC
Are these the same Oracle 11g and MySQL 5.1 databases that are being used with the Jackrabbit runs?

Comment 20 Randall Hauch 2010-12-08 16:56:37 UTC
Link: Added: This issue incorporates MODE-1047


Comment 21 Petr Široký 2010-12-08 17:16:49 UTC
I can't be sure, because we have several databases and the one picked is one that is not currently used by other tests. But the test with JackRabbit was run several times (with Oracle 11g), so if there were any problems we would probably encountered them.

I'll try to run the test using JackRabbit with same databases as with modeshape.

Comment 22 Randall Hauch 2010-12-08 17:51:01 UTC
Is there a way to instrument the LeakTest (e.g., https://svn.devel.redhat.com/repos/jboss-qa/guvnor/branches/BRMS-5.1.x/guvnor-leak-test/src/com/jboss/guvnor/leak/LeakTest.java ?) to record how long Guvnor operation is taking? For example, I'd be very interested to know the times (and trends) on each of the lines marked with a '*' below:

				DrlRule r = b.openDrlRule("Dummy rule");
				r.define("when\nthen System.out.println(" + i + ");");
			*	r.validate();
			*	r.saveChanges("Some commit");
			*	Package p = guvnor.openPackage("mortgages");
			*	p.viewSource();
			*	p.saveAndValidate();
			*	p.build();
			*	p.createSnapshot("TEST", null, "My TEST Snapshot");

IIUC, the 'p.saveAndValidate()' method results in a number of reads, and the 'p.build()' call results in a LOT of reads.

BTW, I've create some ModeShape integration JUnit tests that attempt to replicate some of the JCR activity similar to what most of the calls above are doing (I've traced many of the above calls down to the JCR activities, and luckly most are pretty straightforward). But I've not really tried the 'saveAndValidate()' or 'build()' calls, as they appear to be very complicated and involve a lot of reads. So without calls to represent these more intensive activities, I'm able to show with ModeShape using an in-memory HSQLDB that repeatedly performing these activities is showing actually improved performance with more iterations. This is why I'm very interested in the time measurements of the various activities I called out above.

Comment 23 Petr Široký 2010-12-08 20:00:45 UTC
The recording of how long the operations are taking shouldn't be big problem. I'll modify the the test and run it again.

I also tried run the test with in-memory HSQL and default modeshape configuration and the time of each loop wasn't rising. This is just happening when using 'real' DBMS like Oracle or MySQL.

Comment 24 Petr Široký 2010-12-08 20:14:17 UTC
I ran the test with jackrabbit and with exactly same databases as with modeshape. Charts attached.

Comment 25 Petr Široký 2010-12-08 20:14:17 UTC
Attachment: Added: leak-test-jackrabbit-oracle11g.png
Attachment: Added: leak-test-jackrabbit-mysql51.png


Comment 26 Randall Hauch 2010-12-08 20:21:00 UTC
Thanks, Petr. I look forward to seeing the times of the various calls within the loop. BTW, I'd be happy with raw numbers per iteration, or even some log file, if that's easiest.

Comment 27 Randall Hauch 2010-12-08 20:23:21 UTC
Oh, that result of using the in-memory HSQL database not causing the performance degradation is very interesting. That helps narrow it down even more. Good work!

Comment 28 Randall Hauch 2010-12-08 22:58:01 UTC
I'm running my tests (that try to replicate some of what Guvnor is doing during the LeakTest) locally against the in-memory store (e.g., hash map) and three JPA/Hibernate stores (HSQLDB in-memory, a local PostgreSQL 8.4.5-1 instance, and a remote QA Oracle10g instance). 

- The tests using HSQLDB (in-memory or file-based) take approximately 1.5X as long as the in-memory store.
- The tests using a real local PostgreSQL instance take approximately 3.2X as long as the in-memory store. 
- The tests using the remote Oracle instance take approximately 530X as long as the in-memory store!

It's simply not practical for me to run local code against the remote Oracle instance.

Unfortunately, my hacked tests do not replicate the performance degradation: everything I've seen shows steady, level performance (after something of a short warming period with higher-than average times).

Comment 29 Petr Široký 2010-12-09 13:50:51 UTC
Added chart that shows how much time are the Guvnor operations taking.

Comment 30 Petr Široký 2010-12-09 13:50:51 UTC
Attachment: Added: leak-test-modeshape-config-no-indexes-oracle11g-operations.png


Comment 31 Lukáš Petrovický 2010-12-09 14:01:57 UTC
So, to recap.:
- When using in-memory storage, the problem goes away. (See charts.)
- When using SQL database running locally, the problem also goes away. (See Randall's comment.)

From this, it seems to me that the problem may be somehow caused by the slow network connection inbetween Guvnor and database. 
Perhaps some inefficient query is getting too much data from/to the database, resulting in this poor performance?

Comment 32 Randall Hauch 2010-12-09 14:08:29 UTC
I'm not entirely sure we can isolate the slow network as the cause. I think one thing we can say is that using a local in-memory database (HSQLDB) also makes the problem go away, and I suspect that using HSQLDB with file-based storage will also make it go away. Really the only way to isolate the network as the problem would be to run the LeakTest on a machine that has a local MySQL or Oracle database. Is there a way to do that? Does PostgreSQL exhibit the same problem?

I'll dive into the new chart.

Comment 33 Randall Hauch 2010-12-09 15:02:15 UTC
Petr, thanks for uploading that chart. Here's my analysis of the chart (w/o raw numbers):

For iterations 1-5:
- about 44% of the time is spent in createSnapshot()
- about 14% of the time is spent in saveAndValidate()
- about 9% of the time is spent in saveChanges()
- these three methods consume about 67% of the total iteration time

By iteration 40:
- about 57% of the time is spent in createSnapshot(), and the this iteration is taking roughly 5X longer than the first 5
- about 17% of the time is spent in saveAndValidate(), and the this iteration is taking roughly 5X longer than the first 5
- about 8% of the time is spent in saveChanges(), and the this iteration is taking roughly 4-5X longer than the first 5
- these three methods consume about 82% of the total iteration time

Throughout, the times of the other operations are small enough or are stable enough that they don't appear to contribute much to the increase:
- validate()
- openPackage()
- viewSource()
- build()

If I'm tracing the code correctly, the createSnapshot() method is called with an existing snapshot name, no new snapshot name, and a comment. This call is forwarded to Guvnor's ServiceImplementation.createPackageSnapshot(...), tho I don't know how the three parameters map to this method's four parameters. This createPackageSnapshot(...) method does these operations:
1) optionally removes the existing snapshot (when the 3rd parameter is 'true'), via RulesRepository.removePackageSnapshot(...)
2) create the package snapshot, via RulesRepository.createPackageSnapshot(...)
3) load the package snapshot, via RulesRepository.loadPackageSnapshot(...)
4) update the checkin comment, via PackageItem.updateCheckinComment(...)
5) call repository.save()

Time to run some integration tests ...



Comment 34 Petr Široký 2010-12-09 17:13:42 UTC
Added one more chart with local MySQL 5.1 database and modeshape-no-indexes config. The degradation is still there, but the time is not rising as fast as when the remote database is used.

Comment 35 Petr Široký 2010-12-09 17:13:42 UTC
Attachment: Added: leak-test-modeshape-config-no-indexes-mysql51-localdb-operations.png


Comment 36 Len DiMaggio 2010-12-09 17:39:12 UTC
So - the faster the non-in-memory DB connection/access, the slower the problem develops - but it still gets worse over time?

Comment 37 Petr Široký 2010-12-10 10:20:12 UTC
Yes, It seems to me exactly like Len wrote. With local DB, the network latency is much lower and thus the impact on performance is not that big.

Comment 38 Randall Hauch 2010-12-13 16:55:22 UTC
Okay, after several days trying to replicate the problem, I was able to do this by focusing on the createSnapshot activity. When I temporarily (and locally) removed the logic in JcrWorkspace.copy(...) that deals with initializing the version history for all of the 'mix:versionable' nodes that resulted from the copy, I found that performance of my tests was very nearly linear. After a little more investigation, this logic is doing quite a few operations against the source. In fact, the number of operations is linear with the number of 'mix:versionable' nodes, and the time required to perform these operations is also a function (not necessarily linear) with the number of version histories already in the repository.

IMO, this seems to explain quite well all the behavior we're seeing. Because the JPA source is being used with a database that has a lot of (network) latency, the effect was magnified significantly. However, I was able to replicate this progressively worsening performance when using an in-memory source. (The performance degradation is nowhere near as impactful or noticeable with the lower-latency source, but it is still there.)

I've updated MODE-1047 to reflect the actual problem, and will be logging all work and notes against that issue. I have a couple of small fixes that will make some decent improvements (e.g., a halving of the time required to perform these operations, and a reduction in the rate at which the performance degrades). However, some degradation still remains, and so additional work needs to be done. I've tried several additional changes, but these caused numerous failures in the TCK.

In short, more work needs to be done, and some additional performance tuning needs to be done with managing the version histories.  Again, I'll be tracking all this work on MODE-1047.

Given the performance impact is lower with a lower latency DB, I'll leave it up to the BRMS team to decide whether MODE-1047 remains a blocker for the release of BRMS. Note that MODE-1047 is definitely marked as a blocker for SOA-P, even though SOA-P EDS does not use nearly as much versionable nodes as BRMS.

Comment 40 Randall Hauch 2010-12-16 21:32:51 UTC
Several improvements were made under the MODE-1047 work, and were committed today to trunk and the '2.2.x' branch. The tests are showing an improvement of somewhere around 50% reduction in times by my metrics (using the new JcrRepositoryPerformanceTest class), but there still is showing a degradation over time. I will continue to look into this, but at a lower priority (I'll likely resolve MODE-1047 and open a new issue for general performance improvements). Thoughts?

Comment 41 Petr Široký 2010-12-22 15:07:30 UTC
I ran the leak test with the new version of modeshape (2.2.x, r2625) and the performance degradation is certainly smaller than before (charts attached), but still significant:

Default modeshape configuration + oracle 11g: loop1=35s, loop50=75s

Default modeshape configuration + mysql 5.1: loop1=26s , loop50=43s



Comment 42 Petr Široký 2010-12-22 15:07:30 UTC
Attachment: Added: leak-test-modeshape(2.2.x, r2625)-default-config-oracle11g.png
Attachment: Added: leak-test-modeshape(2.2.x, r2625)-default-config-mysql51.png


Comment 43 Tihomir Surdilovic 2011-01-21 18:24:47 UTC
Should this Jira be assigned to me and not Randall? 

Comment 44 Lukáš Petrovický 2011-02-28 14:01:01 UTC
Link: Added: This issue is related to BRMS-557


Comment 45 Jiri Locker 2011-03-01 12:05:32 UTC
Link: Added: This issue is related to SOA-2470


Comment 46 Anne-Louise Tangring 2011-05-31 19:06:01 UTC
Len, please resolve this. Re-test needed for a 5.2 baseline.

Comment 47 Len DiMaggio 2011-05-31 19:52:39 UTC
This will be tested in the first 5.2 build.

Comment 48 Petr Široký 2011-07-21 10:47:30 UTC
Attached chart from test run with BRMS 5.2.0.dev3 and embedded HSQL database. There is a big improvement in performance of ModeShape. Currently only the 'createSnapshot' operation is taking more time with increasing number of loops. 

I will also run this test with Oracle DB when the BRMS-638 is fixed.

Comment 49 Petr Široký 2011-07-21 10:47:30 UTC
Attachment: Added: brms-52dev3-hsql.png


Comment 50 Lukáš Petrovický 2011-08-01 14:25:22 UTC
(In reply to comment #46)
> Len, please resolve this. Re-test needed for a 5.2 baseline.

Re-tested with BRMS 5.2 ER1 and re-opening. Performance much improved, although still not comparable to Jackrabbit. Attachment coming.

Comment 51 Lukáš Petrovický 2011-08-01 14:27:55 UTC
Created attachment 516151 [details]
BRMS 5.2 ER1 + Jackrabbit (Leak test chart)

With JackRabbit, the execution time remains stable and the memory use keeps rising over time.

Comment 52 Lukáš Petrovický 2011-08-01 14:30:17 UTC
Created attachment 516154 [details]
BRMS 5.2 ER1 + Modeshape (Leak test chart)

With Modeshape, the memory use remains more-or-less constant, while the execution time rises dramatically. 

Please note:
- the execution time in all instances higher than with Jackrabbit.
- the interesting (logarithmic?) shape of the curve.

Comment 53 Randall Hauch 2011-08-16 15:21:08 UTC
I would agree that "createSnapshot" does perform significantly slower than JR, but the other performance numbers are not significantly higher than with JR. Obviously we're still trying to improve performance, though it's not likely we can get much improvement by the release.


Note You need to log in before you can comment on or make changes to this bug.