921532 – RemoteStatelessFailoverTestCase(ASYNC-tcp).testFailoverOnUndeploy and testFailoverOnStop intermittently fails

Bug 921532 - RemoteStatelessFailoverTestCase(ASYNC-tcp).testFailoverOnUndeploy and testFailoverOnStop intermittently fails

Summary: RemoteStatelessFailoverTestCase(ASYNC-tcp).testFailoverOnUndeploy and testFai...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Clustering
Sub Component:
Version:	6.1.0,6.2.0,6.1.1,6.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	EAP 6.4.0
Assignee:	jboss-set
QA Contact:	Richard Janík
Docs Contact:
URL:
Whiteboard:	Clustering testsuite
Duplicates (1):	979935 (view as bug list)
Depends On:
Blocks:	996500
TreeView+	depends on / blocked

Reported:	2013-03-14 11:47 UTC by Richard Janík
Modified:	2019-08-19 12:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-08-19 12:47:41 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
TCPGOSSIP conf (1.86 KB, text/plain) 2013-03-14 11:47 UTC, Richard Janík	no flags	Details
stacktrace (40.53 KB, text/plain) 2013-07-22 11:12 UTC, Richard Janík	no flags	Details
View All

Description Richard Janík 2013-03-14 11:47:00 UTC

Created attachment 709984 [details]
TCPGOSSIP conf

Description of problem:

org.jboss.as.test.clustering.cluster.ejb3.stateless.RemoteStatelessFailoverTestCase(ASYNC-tcp).testFailoverOnUndeploy
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-solaris-matrix/jdk=java16_default,label=sol10_sparc64/11/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28ASYNC-tcp%29/testFailoverOnUndeploy/

There is an upstream JIRA saying this should be fixed in 6.1, but I'm still seeing it: https://issues.jboss.org/browse/AS7-5211

This configuration uses TCPGOSSIP (see attachment for exact JGroups stack configuration).

EAP 6.1.0.ER2 (AS 7.2.0.Final-redhat-2)

Comment 1 Richard Janík 2013-03-18 08:30:11 UTC

Oops, the TCPGOSSIP conf was incorrect. Nevertheless, the issue is not specific to this setting and also occurs (with fixed configuration) here:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-rhel-matrix/36/jdk=ibm17,label=RHEL5_x86/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28ASYNC-udp%29/testFailoverOnStop/

(To obtain fixed configuration from the attachment, comment out the MPING protocol in tcp stack)

Comment 2 Richard Janík 2013-04-08 08:27:58 UTC

In ER4 as well.

In the same test case testFailoverOnStop fails randomly as well.

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-JGroups/job/eap-6x-jgroups-tcpgossip-rhel-matrix/43/jdk=openjdk-1.7.0-local,label=RHEL6_x86/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28SYNC-tcp%29/testFailoverOnStop/

Comment 3 Richard Janík 2013-07-22 11:11:53 UTC

Now seen in 6.1.1.ER3.

Since this had no confirmation on flag 6.1.0 and its too late for 6.1.1, I'm setting a flag for 6.2.0.

testFailoverOnStop and testFailoverOnUndeploy are among failing tests.

I'll put a sample stacktrace into attachments as well.

Comment 4 Richard Janík 2013-07-22 11:12:55 UTC

Created attachment 776869 [details]
stacktrace

Comment 5 Dimitris Andreadis 2013-07-31 09:09:52 UTC

I've changed the owner to rjanik who's really dealing with this issue.

Comment 6 Richard Janík 2013-07-31 10:54:52 UTC

Actually, I'm not dealing with this, I've just run into this issue in ER3 again and so I've put up some more information about it. Thus, I've assigned the hot potato back to pferraro (default for Clustering).

Or is there something I'm missing? Why do you think I'm dealing with this?

Comment 7 Radoslav Husar 2013-07-31 11:51:46 UTC

So its not related to TCPGOSSIP but looking at the attachment it neither seems to be a clustering issue. Looking at the logs its a shrinkwrap/jdk/os problem, one of the deployments fail to deploy so the frequency of one node is indeed equal 0 so the test fails at that point.

05:32:41,099 WARNING [org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate] (pool-45-thread-1) Exception encountered during export of archive: org.jboss.shrinkwrap.api.exporter.ArchiveExportException: Failed to write asset to output: /org/jboss/as/test/clustering/NodeNameGetter.class
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$3.handle(StreamExporterDelegateBase.java:272)
	at org.jboss.shrinkwrap.impl.base.io.IOUtil.closeOnComplete(IOUtil.java:219)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase.processNode(StreamExporterDelegateBase.java:233)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:105)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.processNode(AbstractExporterDelegate.java:109)
	at org.jboss.shrinkwrap.impl.base.exporter.AbstractExporterDelegate.doExport(AbstractExporterDelegate.java:95)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase.access$001(StreamExporterDelegateBase.java:50)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$1.call(StreamExporterDelegateBase.java:121)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$1.call(StreamExporterDelegateBase.java:116)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:124)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:118)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Pipe closed
	at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:244)
	at java.io.PipedInputStream.receive(PipedInputStream.java:185)
	at java.io.PipedOutputStream.write(PipedOutputStream.java:105)
	at java.util.zip.ZipOutputStream.writeInt(ZipOutputStream.java:445)
	at java.util.zip.ZipOutputStream.writeEXT(ZipOutputStream.java:362)
	at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:220)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate.closeEntry(JdkZipExporterDelegate.java:84)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate.closeEntry(JdkZipExporterDelegate.java:40)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$2.execute(StreamExporterDelegateBase.java:265)
	at org.jboss.shrinkwrap.impl.base.exporter.StreamExporterDelegateBase$2.execute(StreamExporterDelegateBase.java:233)
	at org.jboss.shrinkwrap.impl.base.io.IOUtil.closeOnComplete(IOUtil.java:217)
	... 18 more

05:32:41,178 WARNING [org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate] (pool-45-thread-1) [SHRINKWRAP-120] Possible deadlock scenario: Got exception on closing the ZIP out stream: Pipe closed: java.io.IOException: Pipe closed
	at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:244)
	at java.io.PipedInputStream.receive(PipedInputStream.java:185)
	at java.io.PipedOutputStream.write(PipedOutputStream.java:105)
	at java.util.zip.ZipOutputStream.writeInt(ZipOutputStream.java:445)
	at java.util.zip.ZipOutputStream.writeEXT(ZipOutputStream.java:362)
	at java.util.zip.ZipOutputStream.closeEntry(ZipOutputStream.java:220)
	at java.util.zip.ZipOutputStream.finish(ZipOutputStream.java:301)
	at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:140)
	at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:321)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:148)
	at org.jboss.shrinkwrap.impl.base.exporter.zip.JdkZipExporterDelegate$1.call(JdkZipExporterDelegate.java:118)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)

Comment 8 Jitka Kozana 2013-07-31 12:06:46 UTC

Rado, thanks for the update. Is there some workaround we could possibly use here?

Comment 9 Radoslav Husar 2013-07-31 13:36:25 UTC

Can you try with 6.1.1.ER4, there have been minor testsuite changes that could affect Solaris (and mainly Windows). Is this reliably reproducible on certain OS/JDK? This report is a mixture of TCPGOSSIP, different JGroups stack and OS so it becomes difficult to keep track.

Comment 10 Richard Janík 2013-08-08 07:10:19 UTC

I've run the testsuite with ER4 and this is still there.

It fails on all OS options and I don't see any connection with any specific JVM or 32/64 bit option. Unfortunately, I don't think it is reproducible 100% of the time on any configuration. From the 2 runs for ER4 and 1 run for ER3, I've caught this 3 times out of 3 only with: jdk=ibm17,label=RHEL5_x86_64 .

Comment 11 Radoslav Husar 2013-08-20 11:41:21 UTC

Part of it looks like a race condition being solved here https://bugzilla.redhat.com/show_bug.cgi?id=956805

Comment 12 Jitka Kozana 2013-08-28 08:22:06 UTC

Update from EAP 6.1.1.ER7 testing: the test testFailoverOnStop fails. 

See the log here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-hpux-matrix/BITS=-d64,jdk=jdk17_hpux,label=hpux11v3/lastCompletedBuild/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase(ASYNC-tcp)/testFailoverOnStop/

Comment 13 Richard Janík 2013-09-19 08:57:50 UTC

Update: present in 6.2.0.ER1 as well.

Comment 14 Paul Ferraro 2013-09-19 17:11:19 UTC

The addition of the GlobalComponentRegistryService should improve the reliability of this test.  Please retest against EAP 6.2.0.ER2.

Comment 15 Richard Janík 2013-10-01 06:05:30 UTC

Retested against EAP 6.2.0.ER3, it's still there.

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jgroups-tcpgossip-rhel-matrix/65/jdk=ibm16,label=RHEL6_x86_64/testReport/org.jboss.as.test.clustering.cluster.ejb3.stateless/RemoteStatelessFailoverTestCase%28SYNC-udp%29/testFailoverOnStop/

Comment 19 Paul Ferraro 2013-10-16 12:53:39 UTC

This is no longer addressable in the time remaining for release.

Comment 24 Radoslav Husar 2014-10-30 09:52:03 UTC

*** Bug 979935 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.