Bug 115041 - cms authoring bottlenecked by slow workflow Java code
Summary: cms authoring bottlenecked by slow workflow Java code
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise CMS
Classification: Retired
Component: other
Version: nightly
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Vadim Nasardinov
QA Contact: Jon Orris
URL:
Whiteboard:
Depends On:
Blocks: 113496
TreeView+ depends on / blocked
 
Reported: 2004-02-05 19:48 UTC by Bryan Che
Modified: 2007-04-18 17:02 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-02-20 15:41:10 UTC
Embargoed:


Attachments (Terms of Use)
OptimizeIt Profiling of slow Java code (37.90 KB, application/octet-stream)
2004-02-05 19:49 UTC, Bryan Che
no flags Details
an Emacs-friendly version of attachment 97496 (22.25 KB, text/plain)
2004-02-11 22:34 UTC, Vadim Nasardinov
no flags Details
instrumentation used for obtaining the trace in comment #3 (4.96 KB, patch)
2004-02-12 00:40 UTC, Vadim Nasardinov
no flags Details | Diff
profiling results at changelist 40375 with IBM 1.3.1 JVM (43.88 KB, application/octet-stream)
2004-02-16 19:20 UTC, Bryan Che
no flags Details
profiling results at changelist 40375 with Sun 1.4.2 JVM (45.05 KB, application/octet-stream)
2004-02-16 19:21 UTC, Bryan Che
no flags Details

Description Bryan Che 2004-02-05 19:48:04 UTC
Description of problem:

CMS Authoring is bottlenecked by slow performance in workflow Java
code.  This page consumes 100% of front-end CPU within around 5
concurrent users.  I'll attach results from profiling the Javacode
with OptimizeIt.  About 50% of the CPU time is spent in workflow code.


Version-Release number of selected component (if applicable):

rickshaw


How reproducible:


Steps to Reproduce:
1. go to /content/admin/
2. create new Article
3. give the article a title and name.  Then hit "Create."  Processing
the forms on this page once you hit "Create" is the bottleneck.
  
Actual results:

slow performance, high front-end CPU


Expected results:

instantaneous response with no CPU usage at all. ;)


Additional info:

Comment 1 Bryan Che 2004-02-05 19:49:14 UTC
Created attachment 97496 [details]
OptimizeIt Profiling of slow Java code

Comment 2 Vadim Nasardinov 2004-02-11 22:34:15 UTC
Created attachment 97599 [details]
an Emacs-friendly version of attachment 97496 [details]

Obtained by running
$ links -dump cms-authoring-bottleneck.html > out.txt
$ perl -i.bak -npe 's,ms -.+\.([^(]+\(),ms - \1,;' out.txt

followed by some minor manual cleanup in Emacs

Comment 3 Vadim Nasardinov 2004-02-12 00:37:50 UTC
2004-02-11 19:31:21,306 [000-0] DEBUG developersupport.Counter -
thread: 127.0.0.1-tcpConnection-9000-0
counter started: 19:31:12.164; ended: 19:31:21.306; duration: 9,142 ms
internalService@6112812: started: 19:31:12.164; ended: 19:31:21.306; duration: 1,304 ms:
  execute query: 6
  internalService@2765503: started: 19:31:13.465; ended: 19:31:21.303; duration: 4,955 ms:
    execute query: 48
    PageCreate.process@6925218: started: 19:31:17.988; ended: 19:31:20.871; duration: 2,883 ms:
      execute query: 349


Time spent inside PageCreate#process is about 31% of the total time
spent in BaseServlet#internalService.  This is consistent with the
profile generated by OptimizeIt.

A total of 403 queries (6+48+349) are executed inside
BaseServlet#internalService.  Out of these, 349 (or 87%) queries are
executed inside PageCreate#process.  The bottleneck is DB-bound, not
Java-bound.

As far as Java code goes, the OptimizeIt profile shown in attachment
97599 [details] shows the same hot spots that are also identified in bug 113696
and 115042.

Shall we mark this ticket as duplicate of bug 113696?


Comment 4 Vadim Nasardinov 2004-02-12 00:40:09 UTC
Created attachment 97600 [details]
instrumentation used for obtaining the trace in comment #3

Comment 5 Vadim Nasardinov 2004-02-12 21:29:50 UTC
Fixed on the trunk in change 40336. See bug 113696, comment #22.

Comment 6 Bryan Che 2004-02-16 19:20:02 UTC
I retested cms authoring/publishing, and this page is still a big
bottleneck (the publishing page is no longer a bottleneck, though).  I
will attach two profiling outputs: one for the IBM 1.3.1 JVM and one
for the Sun 1.4.2 JVM.  I run the server using the IBM JVM, but the
profiler works a bit better with the Sun JVM, so I'm attaching both
(similar) results.

c.r.persistence.metadata.Model.getQualifiedName doesn't show up as a
hotspot anymore.

Comment 7 Bryan Che 2004-02-16 19:20:49 UTC
Created attachment 97709 [details]
profiling results at changelist 40375 with IBM 1.3.1 JVM

Comment 8 Bryan Che 2004-02-16 19:21:28 UTC
Created attachment 97710 [details]
profiling results at changelist 40375 with Sun 1.4.2 JVM

Comment 9 Jon Orris 2004-02-17 23:14:22 UTC
I took a quick look at this before getQualifedName was fixed.
WorkflowTemplate calls this.clone() to create a new template. This in
turn calls Workflow.cloneTasks. This is a general purpose method for
cloning workflow tasks that have arbitrary state, dependencies &
listeners. 

The case of instantiating a new workflow is simpler, as the tasks have
no additional listeners, and are in the initial state. It seems* like
a simpler overriding cloneTasks method in WorkflowTemplate would be a
good idea. I resurrected the code I'd played around with & re-ran it
today. It appears to be about twice as fast, averaging 347 ms vs 837
ms to instantiate a workflow template. The difference is that it only
clones the tasks & sets their dependencies. All of the other code in
cloneTasks is stripped out.

* Whether this is a good solution is questionable, due to the somewhat
fragile nature of the Workflow code. It's possible that something
unexpected could break; I only spend about 10 minutes changing the
code & testing just item creation.





Comment 10 Vadim Nasardinov 2004-02-18 17:49:41 UTC
To override cloneTasks may be a good thing to do.

It doesn't seem to go to the heart of the issue though.  If you trim
down the call tree reported by OptimizeIt in attachment 97710 [details], it looks
like so:

|99.55% - 145052 ms - run() (TcpConnection.java:139)
| 70.41% - 102590 ms - handleConnection() (HttpRequest.java:164)
|  68.99% - 100528 ms - handleRequest() (HttpRequest.java:246)
|   68.97% - 100490 ms - service() (Invocation.java:315)
|    68.97% - 100490 ms - doFilter() (FilterChainServlet.java:96)
|     68.97% - 100490 ms - service() (HttpServlet.java:853)
|      49.98% - 72820 ms - service() (HttpServlet.java:760)
|       49.39% - 71973 ms - doPost() (BaseServlet.java:291)
|        49.31% - 71845 ms - internalService() (BaseServlet.java:147)
|          [workflow application]
|
|      18.99% - 27670 ms - service() (HttpServlet.java:740)
|       18.99% - 27670 ms - doGet() (BaseServlet.java:272)
|        18.83% - 27445 ms - internalService() (BaseServlet.java:147)
|           [XML generation]
|
| 29.14% - 42462 ms - handleConnection() (HttpRequest.java:163)
|  29.13% - 42442 ms - handleRequest() (HttpRequest.java:246)
|   29.13% - 42442 ms - service() (Invocation.java:315)
|     [XML generation]

About half the time is spent creating and applying a workflow to the
newly created content item.  As much as 40% of the time is spent in
XML generation.

The "hot spots" portion of the OptimizeIt report doesn't show any
particularly hot call sites in our code, aside from calls to the JDBC
driver.  If you filter out third-party classes and aggregate multiple
occurrences of things like "Path.get" into a single entry, you'll get
this:

Total:                                                       36.76%   60108 ms
------------------------------------------------------------------------------
Path.get                                                     2.46%    3641 ms
Adapter.getSessionKey                                        1.30%    1919 ms
ObjectMap.getKeyProperties                                   0.97%    1434 ms
ASCII_CharStream.<init>                                      0.93%    1399 ms
ParameterMap.makeQueryString                                 0.82%    1242 ms
SQLParserTokenManager.jjMoveNfa_0                            0.80%    1375 ms
SQLWriter.write                                              0.72%    1210 ms
OID.hashCode                                                 0.66%     978 ms
ObjectType.getProperty                                       0.57%     875 ms
ObjectType.getBasetype                                       0.56%     857 ms
Path.add                                                     0.52%     772 ms
Root.getObjectMap                                            0.51%     763 ms
Mist.get                                                     0.50%     743 ms
... [549 more lines omitted for brevity's sake] ...

So, about 36% of the time is spent in our code, the rest of it in
third-party code (Resin and Oracle's JDBC driver).  Looking at hot
spots in our code, nothing really stands out.

We'll have to ask Bryan to confirm or deny the following conjecture.

The way he finds bottlenecks is by running E-Tester against a server
instance and noting pages that take too long to load.  This particular
page may be slow because it does two CPU-intensive things, especially
on the very first load:

  (1) Create a content item, create a workflow, apply the workflow.
      This takes upwards of 350 queries, which gotta be pretty slow
      any way you slice it.

  (2) Once the the workflow is created and applied, a redirect is
      issued from the content section page
      /ccm/content/admin/index.jsp to the authoring tab of the content
      item page /ccm/content/admin/item.jsp.

      One item.jsp is loaded for the first time, we perform a boatload
      of computations.

I think the hot spots table provides a clue suggesting that Bryan
profiled the very first load of the offending page.  Note that there
are calls to SQLParserTokenManager.jjMoveNfa_0.  I think this would
not show up on the second and subsequent loads of the same page,
because the SQL would be cached by that time.  (I have to double-check
though.)

So, I'm fairly pessimistic about optimizing away the current slowness
at this point.  I'll look into this some more.
    


P.S. The redirect is performed by CreationSelector.editItem, as shown
by the following stack trace:

2004-02-18 10:14:17,971 [000-0] DEBUG item.jsp -
/ccm/content/admin/item.jsp?item_id=22003&set_tab=1
java.lang.Throwable
        at com.arsdigita.cms.ui.ContentItemPage.getItemURL(ContentItemPage.java:400)
        at com.arsdigita.cms.ui.authoring.CreationSelector.editItem(CreationSelector.java:251)
        at com.arsdigita.cms.ui.authoring.PageCreate.process(PageCreate.java:158)
        ...
        at com.arsdigita.cms.dispatcher.CMSPage.dispatch(CMSPage.java:296)
        at _packages._content_22dsection._www._admin._index__jsp._jspService(_index__jsp.java:54)


Comment 11 Vadim Nasardinov 2004-02-18 18:51:19 UTC
The part about SQLParserTokenManager.jjMoveNfa_0 supposedly being
called only on the first page load turns out to be untrue.  It is
called on every page load.

I also forgot to mention that the script for filtering out third-party
classes out of the OptimizeIt hot-spot list and aggregating the
resulting entries can be found in //users/vadim/code/tcl/hotspots.tcl


Comment 12 Vadim Nasardinov 2004-02-18 20:23:59 UTC
Before I forget, let me write down what I found out about the
SQLParserTokenManager.jjMoveNfa_0 business.

Here's one possible microoptimization of persistence.

The following classes call SQLParser.sql() repeatedly instantiating
SQLParser with identical argument and getting the same SQL in return.
We could cache the returned parsed SQL for the duration of the
transaction.  So, for example, the first time we instantiate a
SQLParser with the string
   "(className = :className0) and (appName = :appName1)"
as the constructor parameter, we save the result of SQLParser.getSQL().
The next 23 times the same string is passed, we return the previously
computed SQL.

The only thing that prevents us from doing this is the fact that
identical arguments are not entirely identical.  There are trivial
naming variations.  You see things like ":className0", :className23",
":className30208", etc.  One (and possibly the only) source of this
trivial variability is the bindName method in FilterImpl.  It
increments the following static field on each call to bindName:

    private static int s_counter = 0;

and appends this counter to bind names, like so:

        result.append(s_counter++ & MASK);

The counter seems to be devoid of any semantic meaning, insofar as
replacing the above line with

        result.append(0);

doesn't seem to break anything.

The following classes make calls to SQLParser.sql():
  
com.arsdigita.persistence.DataQueryImpl.unalias(DataQueryImpl.java:509)
com.redhat.persistence.engine.rdbms.QGen$2.onPassthrough(QGen.java:700)
com.redhat.persistence.engine.rdbms.SQLWriter.write(SQLWriter.java:325)

Line numbers are as of //core-platform/dev/src/...@40519.


Comment 13 Vadim Nasardinov 2004-02-18 20:44:40 UTC
There is a huge difference between the first time you create an
Article and subsequent tries.

On the first try, it takes 8.459 sec to serve the request that creates
a new Article:

  2004-02-18 15:40:02,309 [000-1] INFO  web.BaseServlet:
    Servicing request: /ccm/content/admin/index.jsp
  2004-02-18 15:40:02,322 [000-1] INFO  web.BaseServlet:
    Servicing request: /__ccm__/servlet/content-section/admin/index.jsp
  2004-02-18 15:40:05,675 [000-1] INFO  web.BaseServlet:
    Done servicing request: /__ccm__/servlet/content-section/admin/index.jsp
  2004-02-18 15:40:05,675 [000-1] INFO  web.BaseServlet:
    Done servicing request: /ccm/content/admin/index.jsp
  2004-02-18 15:40:05,693 [000-0] INFO  web.BaseServlet:
    Servicing request: /ccm/content/admin/item.jsp
  2004-02-18 15:40:05,703 [000-0] INFO  web.BaseServlet:
    Servicing request: /__ccm__/servlet/content-section/admin/item.jsp
  2004-02-18 15:40:10,767 [000-0] INFO  web.BaseServlet:
    Done servicing request: /__ccm__/servlet/content-section/admin/item.jsp
  2004-02-18 15:40:10,768 [000-0] INFO  web.BaseServlet:
    Done servicing request: /ccm/content/admin/item.jsp


On the second try, it takes only 3.203 sec to create
another Article:

  2004-02-18 15:41:05,833 [000-1] INFO  web.BaseServlet:
    Servicing request: /ccm/content/admin/index.jsp
  2004-02-18 15:41:05,848 [000-1] INFO  web.BaseServlet:
    Servicing request: /__ccm__/servlet/content-section/admin/index.jsp
  2004-02-18 15:41:08,274 [000-1] INFO  web.BaseServlet:
    Done servicing request: /__ccm__/servlet/content-section/admin/index.jsp
  2004-02-18 15:41:08,275 [000-1] INFO  web.BaseServlet:
    Done servicing request: /ccm/content/admin/index.jsp
  2004-02-18 15:41:08,289 [000-0] INFO  web.BaseServlet:
    Servicing request: /ccm/content/admin/item.jsp
  2004-02-18 15:41:08,297 [000-0] INFO  web.BaseServlet:
    Servicing request: /__ccm__/servlet/content-section/admin/item.jsp
  2004-02-18 15:41:09,036 [000-0] INFO  web.BaseServlet:
    Done servicing request: /__ccm__/servlet/content-section/admin/item.jsp
  2004-02-18 15:41:09,036 [000-0] INFO  web.BaseServlet:
    Done servicing request: /ccm/content/admin/item.jsp

The second try takes 62% less time than the first time.

To make sure we're optimizing the most common case, it would be good
to know whether Bryan's profiled the first try, which involves the
initialization of the item.jsp page, or one of the subsequent attempts
in which the item.jsp page is already initialized.


Comment 14 Bryan Che 2004-02-18 22:56:45 UTC
To answer Vadim's questions: during profiling, I tested the same
transaction repeatedly for a total of about 150 transactions.  So, the
profiling data is averaged out over a bunch of tests.  I also did run
the load testing for several iterations before starting the profiler,
so the data in the profiling results is only for the common case.

For profiling, I didn't go through my entire create-article-publish
load test.  I ran a small test that went through two pages repeatedly:
It went to the page where you fill in the title and name for creating
an article, and then it hit submit to go to the next page.

If you look at the profiling data, you can see the results for
generating the first page, generating the second page, and also
processing the form data.  Everything under the node "29.14% - 42462
ms - com.caucho.server.http.HttpRequest.handleConnection()
(HttpRequest.java:163)" is the first page where you fill out the
information.  Everything under the node "49.98% - 72820 ms -
javax.servlet.http.HttpServlet.service() (HttpServlet.java:760)" is
for processing the form.  Everything under the node "18.99% - 27670 ms
- javax.servlet.http.HttpServlet.service() (HttpServlet.java:740)" is
for displaying the second page.  These results are easier to see and
understand natively inside OptimizeIt, so perhaps I should show them
to you on my computer sometime.

The form processing part seems to be the worst offender, and the vast
majority of form processing is involved with applying workflows.  Like
Vadim said, there doesn't seem to be a simple fix to this
bottleneck--we'd probably need to change significantly the process of
applying workflows at a high level in order to really improve performance.



Comment 15 Vadim Nasardinov 2004-02-19 17:13:39 UTC
Re: comment #12.

I talked with Rafi about the s_counter thingy this morning.  He
says stuff will mostly work if we didn't use this (pseudo-)unique
integer suffix in bound names, except for the cases where stuff
would really break.  (As in "foo=:foo1 or foo=:foo2".)

He also says this code will change once he and Archit lands their
OQL stuff.




Comment 16 Vadim Nasardinov 2004-02-19 22:51:01 UTC
I propose we mark this ticket as WONTFIX for the following reasons.

 1. This is not the most frequently used page.

 2. The slowness is due to the large number of db queries and there is
    not really any single Java bottleneck that stands out.

Our code accounts for about 37% of the total execution time, as shown
in comment #10.

Oracle's JDBC driver accounts for 31% of the total execution time:
    |Total:                                       31.20%   46963 ms
    |--------------------------------------------------------------
    |OCIDBAccess.parseExecuteDescribe             9.82%   14369 ms
    |OCIDBAccess.executeFetch                     7.05%   10314 ms
    |OCIDBAccess.fetch                            6.37%    9337 ms
    |OracleSql.handleODBC                         0.80%    1210 ms
    |OCIDBAccess.open                             0.76%    1121 ms
    |OCIDBAccess.parseExecuteFetch                0.57%     852 ms
    |OCIDBAccess.close                            0.43%     650 ms
    |OCIDBAccess.copy_back_to_next_stream         0.42%     642 ms
    |OCIDBAccess.commit                           0.39%     574 ms
      [107 more lines omitted for brevity]

Standard Java libraries:
    |Total:                                       15.56%   24467 ms
    |--------------------------------------------------------------
    |StreamEncoder$ConverterSE.<init>             2.53%    3708 ms
    |SocketInputStream.read                       1.63%    2388 ms
    |IOException.<init>                           0.95%    1395 ms
    |SocketOutputStream.socketWrite               0.78%    1146 ms
    |ResourceBundle.getLoader                     0.76%    1114 ms
    |URLEncoder.encode                            0.54%     864 ms
    |ZipFile.getEntry                             0.54%     812 ms
    |AbstractList$Itr.hasNext                     0.50%     734 ms
     [136 more lines omitted for brevity]

Saxon:
    |Total:                                     3.41%    6077 ms
    |-----------------------------------------------------------
    |DOMDriver.walkNode                         0.23%     430 ms
    |NamePool.allocate                          0.17%     290 ms
    |StyleElement.processChildren               0.17%     283 ms
    |HTMLEmitter.escapeURL                      0.14%     217 ms
    |NamePool.allocateNamespaceCode             0.10%     170 ms

Various Jakarta libraries:
    |Total:                                    2.13%    3563 ms
    |----------------------------------------------------------
    |Category.isDebugEnabled                   0.45%     670 ms
    |ParentNode.internalInsertBefore           0.29%     472 ms
    |AttributeMap.setNamedItem                 0.13%     201 ms
    |  [44 more lines]

Resin:
    |Total:                                        1.91%    3214 ms
    |--------------------------------------------------------------
    |TcpServer.accept                              0.44%     645 ms
    |UTF8Writer.write                              0.24%     393 ms
    |WriteStream$StreamWriter.write                0.16%     258 ms
    |WriteStream.print                             0.13%     217 ms
    |FilePath.canRead                              0.10%     149 ms
     [47 more lines]


So, 68% of the total execution time is DB-driven.  It's split almost
evenly between persistence and the JDBC driver.  Overall, persistence
takes up quite a lot of resources.  Unfortunately, this slowness is
diffused more or less evenly across a number of classes.

The two worst offenders are Path.get and Session.getSessionKey.

I don't see any way to reduce the number of calls to Path.get.

As far as Session.getSessionKey goes, I don't see a good way to avoid
redundant calls.  (In bug 113696 comment #16, I posted a
proof-of-concept patch that shows that the frequency of calls to
getSessionKey _can_ be reduced, but not in a sound way.)

So, one of the takeways from this ticket is that persistence is not a
thin layer on top of JDBC.  It roughly doubles the time of our
interactions with the database.  It may be worth it to tune
persistence a little further.  This would have to happen after the
land of Archit and Rafi's OQL/sqlgen branch.



Note You need to log in before you can comment on or make changes to this bug.