Bug 1017749

Summary: Hang after tcp-async part of clustering testsuite is finished on Windows
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Richard Janík <rjanik>
Component: ClusteringAssignee: Paul Ferraro <paul.ferraro>
Status: CLOSED CURRENTRELEASE QA Contact: Jitka Kozana <jkudrnac>
Severity: low Docs Contact:
Priority: low    
Version: 6.2.1, 6.2.2CC: rhusar, rjanik
Target Milestone: ---   
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Windows   
Whiteboard: Clustering testsuite
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard Janík 2013-10-10 12:42:09 UTC
Description of problem:

EAP 6.2.0.ER5, ER3.
Not specific to jdk.
Clustering testsuite hangs on Windows after tcp-async part of testsuite is finished. As far as I can tell, the last test to run is always org.jboss.as.test.clustering.cluster.web.ReplicationWebFailoverTestCase, sample log looks like this:

09:40:10 [INFO] --- maven-surefire-plugin:2.11:test (ts.surefire.clust.multinode-manual-tcp-async) @ jboss-as-ts-integ-clust ---
09:40:10 [INFO] Surefire report directory: W:\workspace\eap-6x-jgroups-tcpgossip-win-matrix\jdk\java16_default\label\Win2k8_x86\jboss-eap-6.2-src\testsuite\integration\clust\target\surefire-reports
09:40:10 
09:40:10 -------------------------------------------------------
09:40:10  T E S T S
09:40:10 -------------------------------------------------------
09:40:11 Running org.jboss.as.test.clustering.cluster.ejb2.invalidation.CacheInvalidationTestCase
09:40:53 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.781 sec
09:40:53 Running org.jboss.as.test.clustering.cluster.ejb3.deployment.ClusteredBeanDeploymentTestCase
09:41:20 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.386 sec
09:41:20 Running org.jboss.as.test.clustering.cluster.ejb3.descriptor.disable.DisableClusteredTestCase
09:41:36 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.52 sec
09:41:36 Running org.jboss.as.test.clustering.cluster.ejb3.security.FailoverWithSecurityTestCase
09:42:08 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.337 sec
09:42:08 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.passivation.ClusterPassivationTestCase
09:42:41 Tests run: 7, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 33.246 sec
09:42:41 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.remote.failover.dd.RemoteEJBClientDDBasedSFSBFailoverTestCase
09:42:57 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.627 sec
09:42:57 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.remote.failover.LocalEJBClientFailoverTestCase
09:42:57 Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.001 sec
09:42:57 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.remote.failover.RemoteEJBClientStatefulBeanFailoverTestCase
09:42:57 Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.001 sec
09:42:57 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.remote.failover.SlowUndeploymentRemoteFailoverTestCase
09:42:57 Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.017 sec
09:42:57 Running org.jboss.as.test.clustering.cluster.ejb3.stateful.StatefulFailoverTestCase
09:42:57 Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
09:42:57 Running org.jboss.as.test.clustering.cluster.ejb3.StatefulTimeoutTestCase
09:43:29 Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.927 sec
09:43:29 Running org.jboss.as.test.clustering.cluster.ejb3.stateless.RemoteStatelessFailoverTestCase
09:44:48 Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 79.525 sec <<< FAILURE!
09:44:48 Running org.jboss.as.test.clustering.cluster.ejb3.xpc.StatefulWithXPCFailoverTestCase
09:45:21 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.943 sec
09:45:21 Running org.jboss.as.test.clustering.cluster.jsf.JSFFailoverTestCase
09:45:43 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.244 sec
09:45:43 Running org.jboss.as.test.clustering.cluster.management.CacheTestCase
09:46:00 Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.322 sec
09:46:00 Running org.jboss.as.test.clustering.cluster.singleton.SingletonTestCase
09:46:41 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.621 sec
09:46:41 Running org.jboss.as.test.clustering.cluster.sso.ClusteredSingleSignOnTestCase
09:46:41 Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
09:46:41 Running org.jboss.as.test.clustering.cluster.web.ClusteredWebSimpleTestCase
09:47:56 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.915 sec
09:47:56 Running org.jboss.as.test.clustering.cluster.web.DistributionWebFailoverTestCase
09:48:34 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.632 sec
09:48:34 Running org.jboss.as.test.clustering.cluster.web.GranularWebFailoverTestCase
09:48:51 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.433 sec
09:48:51 Running org.jboss.as.test.clustering.cluster.web.NonHaWebSessionPersistenceTestCase
09:49:10 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.992 sec
09:49:10 Running org.jboss.as.test.clustering.cluster.web.passivation.AttributeBasedSessionPassivationTestCase
09:49:29 Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 18.949 sec
09:49:29 Running org.jboss.as.test.clustering.cluster.web.passivation.SessionBasedSessionPassivationTestCase
09:49:38 Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 8.687 sec
09:49:38 Running org.jboss.as.test.clustering.cluster.web.ReplicationForNegotiationAuthenticatorTestCase
09:49:38 Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec
09:49:38 Running org.jboss.as.test.clustering.cluster.web.ReplicationWebFailoverTestCase
09:50:06 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.91 sec
12:13:46 Build timed out (after 240 minutes). Marking the build as failed.
12:13:46 Archiving artifacts
12:13:48 Recording test results
12:13:51 Notifying upstream projects of job completion
12:13:51 Finished: FAILURE

It may be intermittent, though the only tests I've seen to "pass" for us are with BZ 995013 (tests fail due to servers running from previous tests) because the ts will stop with errors after tcp-sync part and will never go on to execute tcp-async.
(-DextendedTests is on)

Links to logs (all with ER5):
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-win-matrix/jdk=java16_default,label=Win2k8r2_x86_64/64/console
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-win-matrix/jdk=java16_default,label=Win2k8_x86/63/console
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-win-matrix/jdk=java16_default,label=Win2k8_x86_64/63/console
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-win-matrix/jdk=java17_default,label=Win2k8r2_x86_64/63/console

Comment 1 Radoslav Husar 2014-03-25 16:48:47 UTC
The test setup a bit worrying as there is a locking problem described by jenkins on the same bean that is failing those tests (which stinks):

05:25:23 ERROR: Cannot delete workspace: java.io.IOException: Unable to delete w:\workspace\eap-6x-jgroups-tcpgossip-win-matrix\jdk\java16_default\label\w2k12r2\jboss-eap-6.3-src\testsuite\integration\clust\target\jbossas-clustering-ASYNC-tcp-0\standalone\data\infinispan\ejb\org.jboss.as.test.clustering.cluster.ejb3.deployment.ClusteredBean - files in dir: [w:\workspace\eap-6x-jgroups-tcpgossip-win-matrix\jdk\java16_default\label\w2k12r2\jboss-eap-6.3-src\testsuite\integration\clust\target\jbossas-clustering-ASYNC-tcp-0\standalone\data\infinispan\ejb\org.jboss.as.test.clustering.cluster.ejb3.deployment.ClusteredBean\256777216]

There seems to be some sort of stale FS locking going on. I have 2 lucky tips, either the processes are not getting cleared up or this w:/ drive is not local.

Comment 5 Radoslav Husar 2014-10-30 09:55:39 UTC
Has the setup been verified as per comment #1 ?

Comment 6 Richard Janík 2014-11-03 07:42:40 UTC
The setup has changed since comment #1. Processes were not configured to be cleaned up back then and w:/ was most certainly not local. I've stopped seeing this since 6.3.0.ER1 so I don't think this is an issue anymore.