Created attachment 709984 [details] TCPGOSSIP conf Description of problem: org.jboss.as.test.clustering.cluster.ejb3.stateless.RemoteStatelessFailoverTestCase(ASYNC-tcp).testFailoverOnUndeploy https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-solaris-matrix/jdk=java16_default,label=sol10_sparc64/11/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28ASYNC-tcp%29/testFailoverOnUndeploy/ There is an upstream JIRA saying this should be fixed in 6.1, but I'm still seeing it: https://issues.jboss.org/browse/AS7-5211 This configuration uses TCPGOSSIP (see attachment for exact JGroups stack configuration). EAP 6.1.0.ER2 (AS 7.2.0.Final-redhat-2)
Oops, the TCPGOSSIP conf was incorrect. Nevertheless, the issue is not specific to this setting and also occurs (with fixed configuration) here: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-rhel-matrix/36/jdk=ibm17,label=RHEL5_x86/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28ASYNC-udp%29/testFailoverOnStop/ (To obtain fixed configuration from the attachment, comment out the MPING protocol in tcp stack)
In ER4 as well. In the same test case testFailoverOnStop fails randomly as well. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-rhel-matrix/43/jdk=openjdk-1.7.0-local,label=RHEL6_x86/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28SYNC-tcp%29/testFailoverOnStop/
Now seen in 6.1.1.ER3. Since this had no confirmation on flag 6.1.0 and its too late for 6.1.1, I'm setting a flag for 6.2.0. testFailoverOnStop and testFailoverOnUndeploy are among failing tests. I'll put a sample stacktrace into attachments as well.
Created attachment 776869 [details] stacktrace
I've changed the owner to rjanik who's really dealing with this issue.
Actually, I'm not dealing with this, I've just run into this issue in ER3 again and so I've put up some more information about it. Thus, I've assigned the hot potato back to pferraro (default for Clustering). Or is there something I'm missing? Why do you think I'm dealing with this?
So its not related to TCPGOSSIP but looking at the attachment it neither seems to be a clustering issue. Looking at the logs its a shrinkwrap/jdk/os problem, one of the deployments fail to deploy so the frequency of one node is indeed equal 0 so the test fails at that point. 05:32:41,099 WARNING [org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate] (pool-45-thread-1) Exception encountered during export of archive: org.jboss.shrinkwrap.api.exporter.ArchiveExportException: Failed to write asset to output: /org/jboss/as/test/clustering/NodeNameGetter.class at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$3.handle(StreamExporterDelegateBase.java:272) at org.jboss.shrinkwrap.impl.base.io.IOUtil.closeOnComplete(IOUtil.java:219) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase.processNode(StreamExporterDelegateBase.java:233) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:105) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109) at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.doExport(AbstractExporterDelegate.java:95) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase.access$001(StreamExporterDelegateBase.java:50) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$1.call(StreamExporterDelegateBase.java:121) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$1.call(StreamExporterDelegateBase.java:116) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:124) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:118) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Pipe closed at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:244) at java.io.PipedInputStream.receive(PipedInputStream.java:185) at java.io.PipedOutputStream.write(PipedOutputStream.java:105) at java.util.zip.ZipOutputStream.writeInt(ZipOutputStream.java:445) at java.util.zip.ZipOutputStream.writeEXT(ZipOutputStream.java:362) at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:220) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate.closeEntry(JdkZipExporterDelegate.java:84) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate.closeEntry(JdkZipExporterDelegate.java:40) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$2.execute(StreamExporterDelegateBase.java:265) at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$2.execute(StreamExporterDelegateBase.java:233) at org.jboss.shrinkwrap.impl.base.io.IOUtil.closeOnComplete(IOUtil.java:217) ... 18 more 05:32:41,178 WARNING [org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate] (pool-45-thread-1) [SHRINKWRAP-120] Possible deadlock scenario: Got exception on closing the ZIP out stream: Pipe closed: java.io.IOException: Pipe closed at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:244) at java.io.PipedInputStream.receive(PipedInputStream.java:185) at java.io.PipedOutputStream.write(PipedOutputStream.java:105) at java.util.zip.ZipOutputStream.writeInt(ZipOutputStream.java:445) at java.util.zip.ZipOutputStream.writeEXT(ZipOutputStream.java:362) at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:220) at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:301) at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:140) at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:321) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:148) at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:118) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)
Rado, thanks for the update. Is there some workaround we could possibly use here?
Can you try with 6.1.1.ER4, there have been minor testsuite changes that could affect Solaris (and mainly Windows). Is this reliably reproducible on certain OS/JDK? This report is a mixture of TCPGOSSIP, different JGroups stack and OS so it becomes difficult to keep track.
I've run the testsuite with ER4 and this is still there. It fails on all OS options and I don't see any connection with any specific JVM or 32/64 bit option. Unfortunately, I don't think it is reproducible 100% of the time on any configuration. From the 2 runs for ER4 and 1 run for ER3, I've caught this 3 times out of 3 only with: jdk=ibm17,label=RHEL5_x86_64 .
Part of it looks like a race condition being solved here https://bugzilla.redhat.com/show_bug.cgi?id=956805
Update from EAP 6.1.1.ER7 testing: the test testFailoverOnStop fails. See the log here: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-hpux-matrix/BITS=-d64,jdk=jdk17_hpux,label=hpux11v3/lastCompletedBuild/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase(ASYNC-tcp)/testFailoverOnStop/
Update: present in 6.2.0.ER1 as well.
The addition of the GlobalComponentRegistryService should improve the reliability of this test. Please retest against EAP 6.2.0.ER2.
Retested against EAP 6.2.0.ER3, it's still there. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-rhel-matrix/65/jdk=ibm16,label=RHEL6_x86_64/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28SYNC-udp%29/testFailoverOnStop/
This is no longer addressable in the time remaining for release.
*** Bug 979935 has been marked as a duplicate of this bug. ***