Description of problem: CMS Authoring is bottlenecked by slow performance in workflow Java code. This page consumes 100% of front-end CPU within around 5 concurrent users. I'll attach results from profiling the Javacode with OptimizeIt. About 50% of the CPU time is spent in workflow code. Version-Release number of selected component (if applicable): rickshaw How reproducible: Steps to Reproduce: 1. go to /content/admin/ 2. create new Article 3. give the article a title and name. Then hit "Create." Processing the forms on this page once you hit "Create" is the bottleneck. Actual results: slow performance, high front-end CPU Expected results: instantaneous response with no CPU usage at all. ;) Additional info:
Created attachment 97496 [details] OptimizeIt Profiling of slow Java code
Created attachment 97599 [details] an Emacs-friendly version of attachment 97496 [details] Obtained by running $ links -dump cms-authoring-bottleneck.html > out.txt $ perl -i.bak -npe 's,ms -.+\.([^(]+\(),ms - \1,;' out.txt followed by some minor manual cleanup in Emacs
2004-02-11 19:31:21,306 [000-0] DEBUG developersupport.Counter - thread: 127.0.0.1-tcpConnection-9000-0 counter started: 19:31:12.164; ended: 19:31:21.306; duration: 9,142 ms internalService@6112812: started: 19:31:12.164; ended: 19:31:21.306; duration: 1,304 ms: execute query: 6 internalService@2765503: started: 19:31:13.465; ended: 19:31:21.303; duration: 4,955 ms: execute query: 48 PageCreate.process@6925218: started: 19:31:17.988; ended: 19:31:20.871; duration: 2,883 ms: execute query: 349 Time spent inside PageCreate#process is about 31% of the total time spent in BaseServlet#internalService. This is consistent with the profile generated by OptimizeIt. A total of 403 queries (6+48+349) are executed inside BaseServlet#internalService. Out of these, 349 (or 87%) queries are executed inside PageCreate#process. The bottleneck is DB-bound, not Java-bound. As far as Java code goes, the OptimizeIt profile shown in attachment 97599 [details] shows the same hot spots that are also identified in bug 113696 and 115042. Shall we mark this ticket as duplicate of bug 113696?
Created attachment 97600 [details] instrumentation used for obtaining the trace in comment #3
Fixed on the trunk in change 40336. See bug 113696, comment #22.
I retested cms authoring/publishing, and this page is still a big bottleneck (the publishing page is no longer a bottleneck, though). I will attach two profiling outputs: one for the IBM 1.3.1 JVM and one for the Sun 1.4.2 JVM. I run the server using the IBM JVM, but the profiler works a bit better with the Sun JVM, so I'm attaching both (similar) results. c.r.persistence.metadata.Model.getQualifiedName doesn't show up as a hotspot anymore.
Created attachment 97709 [details] profiling results at changelist 40375 with IBM 1.3.1 JVM
Created attachment 97710 [details] profiling results at changelist 40375 with Sun 1.4.2 JVM
I took a quick look at this before getQualifedName was fixed. WorkflowTemplate calls this.clone() to create a new template. This in turn calls Workflow.cloneTasks. This is a general purpose method for cloning workflow tasks that have arbitrary state, dependencies & listeners. The case of instantiating a new workflow is simpler, as the tasks have no additional listeners, and are in the initial state. It seems* like a simpler overriding cloneTasks method in WorkflowTemplate would be a good idea. I resurrected the code I'd played around with & re-ran it today. It appears to be about twice as fast, averaging 347 ms vs 837 ms to instantiate a workflow template. The difference is that it only clones the tasks & sets their dependencies. All of the other code in cloneTasks is stripped out. * Whether this is a good solution is questionable, due to the somewhat fragile nature of the Workflow code. It's possible that something unexpected could break; I only spend about 10 minutes changing the code & testing just item creation.
To override cloneTasks may be a good thing to do. It doesn't seem to go to the heart of the issue though. If you trim down the call tree reported by OptimizeIt in attachment 97710 [details], it looks like so: |99.55% - 145052 ms - run() (TcpConnection.java:139) | 70.41% - 102590 ms - handleConnection() (HttpRequest.java:164) | 68.99% - 100528 ms - handleRequest() (HttpRequest.java:246) | 68.97% - 100490 ms - service() (Invocation.java:315) | 68.97% - 100490 ms - doFilter() (FilterChainServlet.java:96) | 68.97% - 100490 ms - service() (HttpServlet.java:853) | 49.98% - 72820 ms - service() (HttpServlet.java:760) | 49.39% - 71973 ms - doPost() (BaseServlet.java:291) | 49.31% - 71845 ms - internalService() (BaseServlet.java:147) | [workflow application] | | 18.99% - 27670 ms - service() (HttpServlet.java:740) | 18.99% - 27670 ms - doGet() (BaseServlet.java:272) | 18.83% - 27445 ms - internalService() (BaseServlet.java:147) | [XML generation] | | 29.14% - 42462 ms - handleConnection() (HttpRequest.java:163) | 29.13% - 42442 ms - handleRequest() (HttpRequest.java:246) | 29.13% - 42442 ms - service() (Invocation.java:315) | [XML generation] About half the time is spent creating and applying a workflow to the newly created content item. As much as 40% of the time is spent in XML generation. The "hot spots" portion of the OptimizeIt report doesn't show any particularly hot call sites in our code, aside from calls to the JDBC driver. If you filter out third-party classes and aggregate multiple occurrences of things like "Path.get" into a single entry, you'll get this: Total: 36.76% 60108 ms ------------------------------------------------------------------------------ Path.get 2.46% 3641 ms Adapter.getSessionKey 1.30% 1919 ms ObjectMap.getKeyProperties 0.97% 1434 ms ASCII_CharStream.<init> 0.93% 1399 ms ParameterMap.makeQueryString 0.82% 1242 ms SQLParserTokenManager.jjMoveNfa_0 0.80% 1375 ms SQLWriter.write 0.72% 1210 ms OID.hashCode 0.66% 978 ms ObjectType.getProperty 0.57% 875 ms ObjectType.getBasetype 0.56% 857 ms Path.add 0.52% 772 ms Root.getObjectMap 0.51% 763 ms Mist.get 0.50% 743 ms ... [549 more lines omitted for brevity's sake] ... So, about 36% of the time is spent in our code, the rest of it in third-party code (Resin and Oracle's JDBC driver). Looking at hot spots in our code, nothing really stands out. We'll have to ask Bryan to confirm or deny the following conjecture. The way he finds bottlenecks is by running E-Tester against a server instance and noting pages that take too long to load. This particular page may be slow because it does two CPU-intensive things, especially on the very first load: (1) Create a content item, create a workflow, apply the workflow. This takes upwards of 350 queries, which gotta be pretty slow any way you slice it. (2) Once the the workflow is created and applied, a redirect is issued from the content section page /ccm/content/admin/index.jsp to the authoring tab of the content item page /ccm/content/admin/item.jsp. One item.jsp is loaded for the first time, we perform a boatload of computations. I think the hot spots table provides a clue suggesting that Bryan profiled the very first load of the offending page. Note that there are calls to SQLParserTokenManager.jjMoveNfa_0. I think this would not show up on the second and subsequent loads of the same page, because the SQL would be cached by that time. (I have to double-check though.) So, I'm fairly pessimistic about optimizing away the current slowness at this point. I'll look into this some more. P.S. The redirect is performed by CreationSelector.editItem, as shown by the following stack trace: 2004-02-18 10:14:17,971 [000-0] DEBUG item.jsp - /ccm/content/admin/item.jsp?item_id=22003&set_tab=1 java.lang.Throwable at com.arsdigita.cms.ui.ContentItemPage.getItemURL(ContentItemPage.java:400) at com.arsdigita.cms.ui.authoring.CreationSelector.editItem(CreationSelector.java:251) at com.arsdigita.cms.ui.authoring.PageCreate.process(PageCreate.java:158) ... at com.arsdigita.cms.dispatcher.CMSPage.dispatch(CMSPage.java:296) at _packages._content_22dsection._www._admin._index__jsp._jspService(_index__jsp.java:54)
The part about SQLParserTokenManager.jjMoveNfa_0 supposedly being called only on the first page load turns out to be untrue. It is called on every page load. I also forgot to mention that the script for filtering out third-party classes out of the OptimizeIt hot-spot list and aggregating the resulting entries can be found in //users/vadim/code/tcl/hotspots.tcl
Before I forget, let me write down what I found out about the SQLParserTokenManager.jjMoveNfa_0 business. Here's one possible microoptimization of persistence. The following classes call SQLParser.sql() repeatedly instantiating SQLParser with identical argument and getting the same SQL in return. We could cache the returned parsed SQL for the duration of the transaction. So, for example, the first time we instantiate a SQLParser with the string "(className = :className0) and (appName = :appName1)" as the constructor parameter, we save the result of SQLParser.getSQL(). The next 23 times the same string is passed, we return the previously computed SQL. The only thing that prevents us from doing this is the fact that identical arguments are not entirely identical. There are trivial naming variations. You see things like ":className0", :className23", ":className30208", etc. One (and possibly the only) source of this trivial variability is the bindName method in FilterImpl. It increments the following static field on each call to bindName: private static int s_counter = 0; and appends this counter to bind names, like so: result.append(s_counter++ & MASK); The counter seems to be devoid of any semantic meaning, insofar as replacing the above line with result.append(0); doesn't seem to break anything. The following classes make calls to SQLParser.sql(): com.arsdigita.persistence.DataQueryImpl.unalias(DataQueryImpl.java:509) com.redhat.persistence.engine.rdbms.QGen$2.onPassthrough(QGen.java:700) com.redhat.persistence.engine.rdbms.SQLWriter.write(SQLWriter.java:325) Line numbers are as of //core-platform/dev/src/...@40519.
There is a huge difference between the first time you create an Article and subsequent tries. On the first try, it takes 8.459 sec to serve the request that creates a new Article: 2004-02-18 15:40:02,309 [000-1] INFO web.BaseServlet: Servicing request: /ccm/content/admin/index.jsp 2004-02-18 15:40:02,322 [000-1] INFO web.BaseServlet: Servicing request: /__ccm__/servlet/content-section/admin/index.jsp 2004-02-18 15:40:05,675 [000-1] INFO web.BaseServlet: Done servicing request: /__ccm__/servlet/content-section/admin/index.jsp 2004-02-18 15:40:05,675 [000-1] INFO web.BaseServlet: Done servicing request: /ccm/content/admin/index.jsp 2004-02-18 15:40:05,693 [000-0] INFO web.BaseServlet: Servicing request: /ccm/content/admin/item.jsp 2004-02-18 15:40:05,703 [000-0] INFO web.BaseServlet: Servicing request: /__ccm__/servlet/content-section/admin/item.jsp 2004-02-18 15:40:10,767 [000-0] INFO web.BaseServlet: Done servicing request: /__ccm__/servlet/content-section/admin/item.jsp 2004-02-18 15:40:10,768 [000-0] INFO web.BaseServlet: Done servicing request: /ccm/content/admin/item.jsp On the second try, it takes only 3.203 sec to create another Article: 2004-02-18 15:41:05,833 [000-1] INFO web.BaseServlet: Servicing request: /ccm/content/admin/index.jsp 2004-02-18 15:41:05,848 [000-1] INFO web.BaseServlet: Servicing request: /__ccm__/servlet/content-section/admin/index.jsp 2004-02-18 15:41:08,274 [000-1] INFO web.BaseServlet: Done servicing request: /__ccm__/servlet/content-section/admin/index.jsp 2004-02-18 15:41:08,275 [000-1] INFO web.BaseServlet: Done servicing request: /ccm/content/admin/index.jsp 2004-02-18 15:41:08,289 [000-0] INFO web.BaseServlet: Servicing request: /ccm/content/admin/item.jsp 2004-02-18 15:41:08,297 [000-0] INFO web.BaseServlet: Servicing request: /__ccm__/servlet/content-section/admin/item.jsp 2004-02-18 15:41:09,036 [000-0] INFO web.BaseServlet: Done servicing request: /__ccm__/servlet/content-section/admin/item.jsp 2004-02-18 15:41:09,036 [000-0] INFO web.BaseServlet: Done servicing request: /ccm/content/admin/item.jsp The second try takes 62% less time than the first time. To make sure we're optimizing the most common case, it would be good to know whether Bryan's profiled the first try, which involves the initialization of the item.jsp page, or one of the subsequent attempts in which the item.jsp page is already initialized.
To answer Vadim's questions: during profiling, I tested the same transaction repeatedly for a total of about 150 transactions. So, the profiling data is averaged out over a bunch of tests. I also did run the load testing for several iterations before starting the profiler, so the data in the profiling results is only for the common case. For profiling, I didn't go through my entire create-article-publish load test. I ran a small test that went through two pages repeatedly: It went to the page where you fill in the title and name for creating an article, and then it hit submit to go to the next page. If you look at the profiling data, you can see the results for generating the first page, generating the second page, and also processing the form data. Everything under the node "29.14% - 42462 ms - com.caucho.server.http.HttpRequest.handleConnection() (HttpRequest.java:163)" is the first page where you fill out the information. Everything under the node "49.98% - 72820 ms - javax.servlet.http.HttpServlet.service() (HttpServlet.java:760)" is for processing the form. Everything under the node "18.99% - 27670 ms - javax.servlet.http.HttpServlet.service() (HttpServlet.java:740)" is for displaying the second page. These results are easier to see and understand natively inside OptimizeIt, so perhaps I should show them to you on my computer sometime. The form processing part seems to be the worst offender, and the vast majority of form processing is involved with applying workflows. Like Vadim said, there doesn't seem to be a simple fix to this bottleneck--we'd probably need to change significantly the process of applying workflows at a high level in order to really improve performance.
Re: comment #12. I talked with Rafi about the s_counter thingy this morning. He says stuff will mostly work if we didn't use this (pseudo-)unique integer suffix in bound names, except for the cases where stuff would really break. (As in "foo=:foo1 or foo=:foo2".) He also says this code will change once he and Archit lands their OQL stuff.
I propose we mark this ticket as WONTFIX for the following reasons. 1. This is not the most frequently used page. 2. The slowness is due to the large number of db queries and there is not really any single Java bottleneck that stands out. Our code accounts for about 37% of the total execution time, as shown in comment #10. Oracle's JDBC driver accounts for 31% of the total execution time: |Total: 31.20% 46963 ms |-------------------------------------------------------------- |OCIDBAccess.parseExecuteDescribe 9.82% 14369 ms |OCIDBAccess.executeFetch 7.05% 10314 ms |OCIDBAccess.fetch 6.37% 9337 ms |OracleSql.handleODBC 0.80% 1210 ms |OCIDBAccess.open 0.76% 1121 ms |OCIDBAccess.parseExecuteFetch 0.57% 852 ms |OCIDBAccess.close 0.43% 650 ms |OCIDBAccess.copy_back_to_next_stream 0.42% 642 ms |OCIDBAccess.commit 0.39% 574 ms [107 more lines omitted for brevity] Standard Java libraries: |Total: 15.56% 24467 ms |-------------------------------------------------------------- |StreamEncoder$ConverterSE.<init> 2.53% 3708 ms |SocketInputStream.read 1.63% 2388 ms |IOException.<init> 0.95% 1395 ms |SocketOutputStream.socketWrite 0.78% 1146 ms |ResourceBundle.getLoader 0.76% 1114 ms |URLEncoder.encode 0.54% 864 ms |ZipFile.getEntry 0.54% 812 ms |AbstractList$Itr.hasNext 0.50% 734 ms [136 more lines omitted for brevity] Saxon: |Total: 3.41% 6077 ms |----------------------------------------------------------- |DOMDriver.walkNode 0.23% 430 ms |NamePool.allocate 0.17% 290 ms |StyleElement.processChildren 0.17% 283 ms |HTMLEmitter.escapeURL 0.14% 217 ms |NamePool.allocateNamespaceCode 0.10% 170 ms Various Jakarta libraries: |Total: 2.13% 3563 ms |---------------------------------------------------------- |Category.isDebugEnabled 0.45% 670 ms |ParentNode.internalInsertBefore 0.29% 472 ms |AttributeMap.setNamedItem 0.13% 201 ms | [44 more lines] Resin: |Total: 1.91% 3214 ms |-------------------------------------------------------------- |TcpServer.accept 0.44% 645 ms |UTF8Writer.write 0.24% 393 ms |WriteStream$StreamWriter.write 0.16% 258 ms |WriteStream.print 0.13% 217 ms |FilePath.canRead 0.10% 149 ms [47 more lines] So, 68% of the total execution time is DB-driven. It's split almost evenly between persistence and the JDBC driver. Overall, persistence takes up quite a lot of resources. Unfortunately, this slowness is diffused more or less evenly across a number of classes. The two worst offenders are Path.get and Session.getSessionKey. I don't see any way to reduce the number of calls to Path.get. As far as Session.getSessionKey goes, I don't see a good way to avoid redundant calls. (In bug 113696 comment #16, I posted a proof-of-concept patch that shows that the frequency of calls to getSessionKey _can_ be reduced, but not in a sound way.) So, one of the takeways from this ticket is that persistence is not a thin layer on top of JDBC. It roughly doubles the time of our interactions with the database. It may be worth it to tune persistence a little further. This would have to happen after the land of Archit and Rafi's OQL/sqlgen branch.