Hide Forgot
securitylevel_name: Public When running test (description below) with ModeShape the test takes way too much time in comparison with JackRabbit and also the time is rasing. The graph is attached. The test is doing approximately this: 1) Import sample repository (mortgage). 2) Log in, change one drl file, commit changes. 3) Build package. 4) Repeat steps 2 and 3 several times. Source code of the test: https://svn.devel.redhat.com/repos/jboss-qa/guvnor/branches/BRMS-5.1.x/guvnor-leak-test/src/com/jboss/guvnor/leak/LeakTest.java Database could be allocated here: http://dballocator.mw.lab.eng.bos.redhat.com:8080/Allocator/AllocatorServlet?operation=report See: http://hudson.qa.jboss.com/hudson/view/BRMS/job/brms-leak-single/48/console -- modeshape http://hudson.qa.jboss.com/hudson/view/BRMS/job/brms-leak-single/47/console -- jackrabbit It could be related to SOA-2470.
Attachment: Added: modeshape_performance_issue.png
Link: Added: This issue is related to SOA-2470
ModeShape is not the default repo. We will not look at this for BRMS 5.1.0
In that case, the documentation should mention that when Modeshape is used, the GUI will gradually become less and less responsive, to a point where it is not usable.
Ok,, we'll look into it.
Please attach the rule files that were imported, and preferably an System View export of the repository after Step 4 above. Also, please attach the ModeShape configuration file (specifically, I'm interested in which data source & DBMS is being used, whether the database is remote or local, etc.).
Also, can you please give a brief explanation of what these tests are doing? Any specific code references would be very desirable.
Modeshape config, data source configuration and sample repository files attached (sorry for the inaccuracy, whole sample repository is imported, not rules files). Oracle11g is used as DBMS and it is remote (see updated description for allocating).
Attachment: Added: jboss-brms-ds.xml Attachment: Added: modeshape-config.xml Attachment: Added: mortgage-sample-repository.xml
Link to the source code of test added. Also I will ask Tomáš Schlosser if he can add some more information, because he is the maintainer of this test.
I attached four new configuration files that set up ModeShape is several different ways that, hopefully, will help isolate the linear performance degradation. There are three configurations that keep all functionality, but store various parts of the managed information in-memory: - modeshape-config-in-memory.xml will store the repository content in-memory, but otherwise does everything as usual - modeshape-config-no-indexes.xml will keep the Lucene indexes in-memory, but otherwise does everything as usual - modeshape-config-in-memory-no-indexes.xml will store the repository content and Lucene indexes in-memory This fourth configuration is a very lightweight process. - modeshape-config-lightweight.xml will store the content in-memory but turns off all query functionality (I'm not sure Guvnor will work with this mode) Any chance you could run the LeakTest against each of these four configurations? Thanks!
Attachment: Added: modeshape-config-in-memory.xml Attachment: Added: modeshape-config-no-indexes.xml Attachment: Added: modeshape-config-in-memory-no-indexes.xml Attachment: Added: modeshape-config-lightweight.xml
We're working on it.
Attached charts (thanks to Tomáš Schlosser) with test results for first three modeshape configurations. As Randall expected the fourth configuration (lightweight) did not work with Guvnor. The 'in-memory' and 'in-memory-no-indexes' configurations are working fine. The time is not raising. But with 'no-indexes' configuration (with oracle11g) the time for every loop is raising (loop1=cca 35s and loop39=cca 105s).
Attachment: Added: leak-test-modeshape-config-in-memory.png Attachment: Added: leak-test-modeshape-config-in-memory-no-indexes.png Attachment: Added: leak-test-modeshape-config-no-indexes-oracle11g.png
Added one more chart with 'no-indexes' configuration and MySQL 5.1. The time is also raising (loop1=cca 22s loop50=cca 40s), but not so dramatically as with Oracle11g.
Attachment: Added: leak-test-modeshape-config-no-indexes-mysql51.png
Are these the same Oracle 11g and MySQL 5.1 databases that are being used with the Jackrabbit runs?
Link: Added: This issue incorporates MODE-1047
I can't be sure, because we have several databases and the one picked is one that is not currently used by other tests. But the test with JackRabbit was run several times (with Oracle 11g), so if there were any problems we would probably encountered them. I'll try to run the test using JackRabbit with same databases as with modeshape.
Is there a way to instrument the LeakTest (e.g., https://svn.devel.redhat.com/repos/jboss-qa/guvnor/branches/BRMS-5.1.x/guvnor-leak-test/src/com/jboss/guvnor/leak/LeakTest.java ?) to record how long Guvnor operation is taking? For example, I'd be very interested to know the times (and trends) on each of the lines marked with a '*' below: DrlRule r = b.openDrlRule("Dummy rule"); r.define("when\nthen System.out.println(" + i + ");"); * r.validate(); * r.saveChanges("Some commit"); * Package p = guvnor.openPackage("mortgages"); * p.viewSource(); * p.saveAndValidate(); * p.build(); * p.createSnapshot("TEST", null, "My TEST Snapshot"); IIUC, the 'p.saveAndValidate()' method results in a number of reads, and the 'p.build()' call results in a LOT of reads. BTW, I've create some ModeShape integration JUnit tests that attempt to replicate some of the JCR activity similar to what most of the calls above are doing (I've traced many of the above calls down to the JCR activities, and luckly most are pretty straightforward). But I've not really tried the 'saveAndValidate()' or 'build()' calls, as they appear to be very complicated and involve a lot of reads. So without calls to represent these more intensive activities, I'm able to show with ModeShape using an in-memory HSQLDB that repeatedly performing these activities is showing actually improved performance with more iterations. This is why I'm very interested in the time measurements of the various activities I called out above.
The recording of how long the operations are taking shouldn't be big problem. I'll modify the the test and run it again. I also tried run the test with in-memory HSQL and default modeshape configuration and the time of each loop wasn't rising. This is just happening when using 'real' DBMS like Oracle or MySQL.
I ran the test with jackrabbit and with exactly same databases as with modeshape. Charts attached.
Attachment: Added: leak-test-jackrabbit-oracle11g.png Attachment: Added: leak-test-jackrabbit-mysql51.png
Thanks, Petr. I look forward to seeing the times of the various calls within the loop. BTW, I'd be happy with raw numbers per iteration, or even some log file, if that's easiest.
Oh, that result of using the in-memory HSQL database not causing the performance degradation is very interesting. That helps narrow it down even more. Good work!
I'm running my tests (that try to replicate some of what Guvnor is doing during the LeakTest) locally against the in-memory store (e.g., hash map) and three JPA/Hibernate stores (HSQLDB in-memory, a local PostgreSQL 8.4.5-1 instance, and a remote QA Oracle10g instance). - The tests using HSQLDB (in-memory or file-based) take approximately 1.5X as long as the in-memory store. - The tests using a real local PostgreSQL instance take approximately 3.2X as long as the in-memory store. - The tests using the remote Oracle instance take approximately 530X as long as the in-memory store! It's simply not practical for me to run local code against the remote Oracle instance. Unfortunately, my hacked tests do not replicate the performance degradation: everything I've seen shows steady, level performance (after something of a short warming period with higher-than average times).
Added chart that shows how much time are the Guvnor operations taking.
Attachment: Added: leak-test-modeshape-config-no-indexes-oracle11g-operations.png
So, to recap.: - When using in-memory storage, the problem goes away. (See charts.) - When using SQL database running locally, the problem also goes away. (See Randall's comment.) From this, it seems to me that the problem may be somehow caused by the slow network connection inbetween Guvnor and database. Perhaps some inefficient query is getting too much data from/to the database, resulting in this poor performance?
I'm not entirely sure we can isolate the slow network as the cause. I think one thing we can say is that using a local in-memory database (HSQLDB) also makes the problem go away, and I suspect that using HSQLDB with file-based storage will also make it go away. Really the only way to isolate the network as the problem would be to run the LeakTest on a machine that has a local MySQL or Oracle database. Is there a way to do that? Does PostgreSQL exhibit the same problem? I'll dive into the new chart.
Petr, thanks for uploading that chart. Here's my analysis of the chart (w/o raw numbers): For iterations 1-5: - about 44% of the time is spent in createSnapshot() - about 14% of the time is spent in saveAndValidate() - about 9% of the time is spent in saveChanges() - these three methods consume about 67% of the total iteration time By iteration 40: - about 57% of the time is spent in createSnapshot(), and the this iteration is taking roughly 5X longer than the first 5 - about 17% of the time is spent in saveAndValidate(), and the this iteration is taking roughly 5X longer than the first 5 - about 8% of the time is spent in saveChanges(), and the this iteration is taking roughly 4-5X longer than the first 5 - these three methods consume about 82% of the total iteration time Throughout, the times of the other operations are small enough or are stable enough that they don't appear to contribute much to the increase: - validate() - openPackage() - viewSource() - build() If I'm tracing the code correctly, the createSnapshot() method is called with an existing snapshot name, no new snapshot name, and a comment. This call is forwarded to Guvnor's ServiceImplementation.createPackageSnapshot(...), tho I don't know how the three parameters map to this method's four parameters. This createPackageSnapshot(...) method does these operations: 1) optionally removes the existing snapshot (when the 3rd parameter is 'true'), via RulesRepository.removePackageSnapshot(...) 2) create the package snapshot, via RulesRepository.createPackageSnapshot(...) 3) load the package snapshot, via RulesRepository.loadPackageSnapshot(...) 4) update the checkin comment, via PackageItem.updateCheckinComment(...) 5) call repository.save() Time to run some integration tests ...
Added one more chart with local MySQL 5.1 database and modeshape-no-indexes config. The degradation is still there, but the time is not rising as fast as when the remote database is used.
Attachment: Added: leak-test-modeshape-config-no-indexes-mysql51-localdb-operations.png
So - the faster the non-in-memory DB connection/access, the slower the problem develops - but it still gets worse over time?
Yes, It seems to me exactly like Len wrote. With local DB, the network latency is much lower and thus the impact on performance is not that big.
Okay, after several days trying to replicate the problem, I was able to do this by focusing on the createSnapshot activity. When I temporarily (and locally) removed the logic in JcrWorkspace.copy(...) that deals with initializing the version history for all of the 'mix:versionable' nodes that resulted from the copy, I found that performance of my tests was very nearly linear. After a little more investigation, this logic is doing quite a few operations against the source. In fact, the number of operations is linear with the number of 'mix:versionable' nodes, and the time required to perform these operations is also a function (not necessarily linear) with the number of version histories already in the repository. IMO, this seems to explain quite well all the behavior we're seeing. Because the JPA source is being used with a database that has a lot of (network) latency, the effect was magnified significantly. However, I was able to replicate this progressively worsening performance when using an in-memory source. (The performance degradation is nowhere near as impactful or noticeable with the lower-latency source, but it is still there.) I've updated MODE-1047 to reflect the actual problem, and will be logging all work and notes against that issue. I have a couple of small fixes that will make some decent improvements (e.g., a halving of the time required to perform these operations, and a reduction in the rate at which the performance degrades). However, some degradation still remains, and so additional work needs to be done. I've tried several additional changes, but these caused numerous failures in the TCK. In short, more work needs to be done, and some additional performance tuning needs to be done with managing the version histories. Again, I'll be tracking all this work on MODE-1047. Given the performance impact is lower with a lower latency DB, I'll leave it up to the BRMS team to decide whether MODE-1047 remains a blocker for the release of BRMS. Note that MODE-1047 is definitely marked as a blocker for SOA-P, even though SOA-P EDS does not use nearly as much versionable nodes as BRMS.
Several improvements were made under the MODE-1047 work, and were committed today to trunk and the '2.2.x' branch. The tests are showing an improvement of somewhere around 50% reduction in times by my metrics (using the new JcrRepositoryPerformanceTest class), but there still is showing a degradation over time. I will continue to look into this, but at a lower priority (I'll likely resolve MODE-1047 and open a new issue for general performance improvements). Thoughts?
I ran the leak test with the new version of modeshape (2.2.x, r2625) and the performance degradation is certainly smaller than before (charts attached), but still significant: Default modeshape configuration + oracle 11g: loop1=35s, loop50=75s Default modeshape configuration + mysql 5.1: loop1=26s , loop50=43s
Attachment: Added: leak-test-modeshape(2.2.x, r2625)-default-config-oracle11g.png Attachment: Added: leak-test-modeshape(2.2.x, r2625)-default-config-mysql51.png
Should this Jira be assigned to me and not Randall?
Link: Added: This issue is related to BRMS-557
Len, please resolve this. Re-test needed for a 5.2 baseline.
This will be tested in the first 5.2 build.
Attached chart from test run with BRMS 5.2.0.dev3 and embedded HSQL database. There is a big improvement in performance of ModeShape. Currently only the 'createSnapshot' operation is taking more time with increasing number of loops. I will also run this test with Oracle DB when the BRMS-638 is fixed.
Attachment: Added: brms-52dev3-hsql.png
(In reply to comment #46) > Len, please resolve this. Re-test needed for a 5.2 baseline. Re-tested with BRMS 5.2 ER1 and re-opening. Performance much improved, although still not comparable to Jackrabbit. Attachment coming.
Created attachment 516151 [details] BRMS 5.2 ER1 + Jackrabbit (Leak test chart) With JackRabbit, the execution time remains stable and the memory use keeps rising over time.
Created attachment 516154 [details] BRMS 5.2 ER1 + Modeshape (Leak test chart) With Modeshape, the memory use remains more-or-less constant, while the execution time rises dramatically. Please note: - the execution time in all instances higher than with Jackrabbit. - the interesting (logarithmic?) shape of the curve.
I would agree that "createSnapshot" does perform significantly slower than JR, but the other performance numbers are not significantly higher than with JR. Obviously we're still trying to improve performance, though it's not likely we can get much improvement by the release.