Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 884646

Summary: Performance of scalability configuration and related failures
Product: [JBoss] JBoss Enterprise Portal Platform 6 Reporter: Michal Vanco <mvanco>
Component: PerformanceAssignee: mposolda
Status: CLOSED CURRENTRELEASE QA Contact: Michal Vanco <mvanco>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0.0CC: epp-bugs, mvecera
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-16 08:54:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
err1
none
err2
none
err3
none
err4
none
err5
none
err6
none
err7 none

Description Michal Vanco 2012-12-06 13:41:50 UTC
I'm creating this issues for tracking purposes. There is unstable progress of scalability testing and logs includes many failures.

I'm seeing various failures at server log during scalability tests (log has more than 2GB for single run - that could be a perf issue as well and it's to even open it :/)

Example log can be downloaded from
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EPP/view/6.0/view/Performance/job/epp6_portal_scalability_loggedUsers_2nodes/22/console-perf15
(that's the log of second node)

For clustering, I'm just changing to standalone-ha.xml (+ update datasources to mysql), I'm using local index which is being created on first startup and I'm adding instance-id and jvmRoute to standalone-ha.xml + running with some -u param for multicast.
I've also tried to turn off session replication (change cache to WEB to local cache), but that didn't help as well.

Full logs are attached to this issus.

--------------------------------
[JBossINF] 18:15:19,379 ERROR [org.infinispan.transaction.TransactionCoordinator] (ajp-perf15/10.16.88.193:8009-775) Error while processing prepare: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [/NODE_MAIN_ROOT_API/idm_realm_portal/ATTRIBUTES] for requestor [GlobalTransaction:&lt;perf15-5667>:2413377:remote]! Lock held by [GlobalTransaction:&lt;perf11-3175>:2506401:local]

[JBossINF] 18:15:19,590 WARN  [com.arjuna.ats.arjuna] (ajp-perf15/10.16.88.193:8009-1594) ARJUNA012125: TwoPhaseCoordinator.beforeCompletion - failed for SynchronizationImple&lt; 0:ffff0a1058c1:703dd361:50bfc771:5063, SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, transaction=TransactionImple &lt; ac, BasicAction: 0:ffff0a1058c1:703dd361:50bfc771:5062 status: ActionStatus.ABORT_ONLY >, lockedKeys=null, backupKeyLocks=[/NODE_MAIN_ROOT_API/idm_realm_portal/ATTRIBUTES/jbpid_group_id_._._.platform_._._users, /NODE_MAIN_ROOT_API/idm_realm_portal/ATTRIBUTES], viewId=0} org.infinispan.transaction.synchronization.SyncLocalTransaction@24d343} org.infinispan.transaction.synchronization.SynchronizationAdapter@24d362 >: org.infinispan.CacheException: Could not prepare.

[JBossINF] 18:15:31,476 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (ajp-perf15/10.16.88.193:8009-1997) ISPN000136: Execution error: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to perf11-3175

[JBossINF] 21:30:22,961 INFO  [org.exoplatform.services.organization.idm.GroupDAOImpl] Identity operation error: : org.infinispan.CacheException: Unable to end batch
...
[JBossINF] Caused by: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction.
...
[JBossINF] Caused by: org.infinispan.CacheException: Could not prepare. 
...
[JBossINF] Caused by: javax.transaction.xa.XAException


And another point is:
[JBossINF] 20:26:02,945 INFO  [org.apache.tomcat.util.http.Cookies] (ajp-perf15/10.16.88.193:8009-111) Cookies: Unknown Special Cookie
this is present for each request (and there are many thousands of them at each second)
I don't know how to get rid off this or workaround

Comment 1 Michal Vanco 2012-12-06 13:42:21 UTC
Created attachment 658738 [details]
err1

Comment 2 Michal Vanco 2012-12-06 13:42:41 UTC
Created attachment 658739 [details]
err2

Comment 3 Michal Vanco 2012-12-06 13:42:59 UTC
Created attachment 658740 [details]
err3

Comment 4 Michal Vanco 2012-12-06 13:43:18 UTC
Created attachment 658741 [details]
err4

Comment 5 Michal Vanco 2012-12-06 13:43:39 UTC
Created attachment 658742 [details]
err5

Comment 6 Michal Vanco 2012-12-06 13:43:57 UTC
Created attachment 658743 [details]
err6

Comment 7 Michal Vanco 2012-12-06 13:44:20 UTC
Created attachment 658744 [details]
err7

Comment 8 mposolda 2012-12-19 22:53:59 UTC
Actually the cookie issue is caused by commons-httpclient used from performance test. commons-httpclient is by default using RFC2109 Cookie policy, which means that it adds some informations like $Version and $Path into HTTP cookie header. Parsing of those cookie attributes is not handled 100% correctly on AS7 side.

So to avoid this issue, the performance test needs to be configured so that HttpClient will use browser compatible cookie policy and won't send any cookie attributes (like Version and Path) to JPP. It could be done either by:
- Use something like:
httpClient.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
in class HttpRequestProcessorFactoryImpl (or make it configurable via some parameter passed to this processor factory)

- Or by setting System property "-Dapache.commons.httpclient.cookiespec=COMPATIBILITY" which will ensure that commons-httpclient will use this browser compatible policy by default.

This should help with the "Unknown Special Cookie" message and hopefully with scalability as well.

Comment 9 Michal Vanco 2012-12-20 09:24:14 UTC
Marek, thanks for this excellent finding - I'm going to verify that now!
commons-httpclient version 3.1 is used at loaddriver, do you think changing this version could also help?
I'll update here after my verification.

Comment 10 Michal Vanco 2012-12-20 17:44:12 UTC
Cookie issue is gone, but scalability is not what we expect anyway.
You can have a look at logs and progress here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EPP/view/6.0/view/Performance/job/epp6_portal_scalability_loggedUsers_2nodes/29

Comment 11 mposolda 2012-12-21 20:12:02 UTC
Thanks Michal! Good thing is that now we can better see other exception messages :)

Can you please try to run scalability build with disabled cluster for Picketlink IDM? This will allow us to see if bottleneck is coming from here (seems that yes according to most of exception messages).

The easiest way to do it is to comment "apiCacheProvider" and "storeCacheProvider" entries for cluster profile in file gatein/gatein.ear/portal.war/WEB-INF/conf/organization/idm-configuration.xml . It can look like this:

      <value-param>
        <name>apiCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan.xml</value>
      </value-param>
<!--
      <value-param profiles="cluster">
        <name>apiCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan-cluster.xml</value>
      </value-param>
-->
      <value-param>
        <name>storeCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan.xml</value>
      </value-param>
<!--
      <value-param profiles="cluster">
        <name>storeCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan-cluster.xml</value>
      </value-param>
-->

Comment 12 Michal Vanco 2013-01-03 14:34:02 UTC
Good hint Marek, I've tried with disabled clustered apiCacheConfig & storeCacheConfig and exceptions during scalability are gone.
Now we have to find the way how to configure picketlink clustering.
You can have a look at job here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EPP/view/6.0/view/Performance/job/epp6_portal_scalability_loggedUsers_2nodes/30

Comment 13 mposolda 2013-01-04 12:09:40 UTC
Thanks Michal,
so for clustering configuration: is it possible to try:

- Left idm-configuration.xml as is by default (without any commented section). So it will look like this:
<value-param>
        <name>apiCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan.xml</value>
      </value-param>

      <value-param profiles="cluster">
        <name>apiCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan-cluster.xml</value>
      </value-param>

      <value-param>
        <name>storeCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan.xml</value>
      </value-param>

      <value-param profiles="cluster">
        <name>storeCacheConfig</name>
        <value>war:/conf/organization/picketlink-idm/infinispan-cluster.xml</value>
      </value-param>

- In file gatein/gatein.ear/portal.war/WEB-INF/conf/organization/picketlink-idm/infinispan-cluster.xml add attribute syncCommitPhase="false" to transaction section:
<transaction transactionMode="TRANSACTIONAL" lockingMode="OPTIMISTIC" autoCommit="true" syncCommitPhase="false" />

Comment 14 Michal Vanco 2013-01-04 14:10:21 UTC
Marek, unfortunately this didn't help.
Progress with same errors is here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EPP/view/6.0/view/Performance/job/epp6_portal_scalability_loggedUsers_2nodes/31

Comment 15 Michal Vanco 2013-01-07 08:16:08 UTC
Hi Marek,
it seems like async replication brought expected performance improvement at picketlink clustering, but there are still some failures. Details are at builds 32 & 34.
Do you plan to do some other updates I can verify? Thanks!

Comment 16 JBoss JIRA Server 2013-01-29 12:48:04 UTC
Marek Posolda <mposolda> updated the status of jira GTNPORTAL-2792 to Resolved

Comment 17 JBoss JIRA Server 2013-01-29 12:48:04 UTC
Marek Posolda <mposolda> made a comment on jira GTNPORTAL-2792

Fixed in GateIn master in commit https://github.com/gatein/gatein-portal/commit/528e769a0b11caa1a6fc92d47d547f28ea231c3b

Comment 18 Michal Vanco 2013-02-01 10:47:33 UTC
Scalability for logged users was fixed by above change.

Statistics were updated at https://docs.google.com/a/rhcollab.com/spreadsheet/ccc?key=0At752QrNfufDdG1CcnFfTzVNWnh6aUJQcmZjbi1SRWc#gid=11 

Results for 1,2 nodes are now comparable with results from EPP 5.2