Description of problem: WSDiscoveryTestCase failure see http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jbossws-testsuite-hpux/jdk=jdk16_hpux,label_exp=hpux11v3/lastCompletedBuild/testReport/org.jboss.test.ws.jaxws.samples.wsdd/WSDiscoveryTestCase/testProbeAndResolve/ Error Message expected:<3> but was:<4> Stacktrace junit.framework.AssertionFailedError: expected:<3> but was:<4> at junit.framework.Assert.fail(Assert.java:50) at org.jboss.test.ws.jaxws.samples.wsdd.WSDiscoveryTestCase.testProbeAndResolve(WSDiscoveryTestCase.java:69) Standard Error WSDiscoveryTestCase ProbeMatchType address http://localhost:8080/jaxws-samples-wsdd/WSDDService WSDiscoveryTestCase ProbeMatchType address http://localhost:8080/jaxws-samples-wsdd2/AnotherWSDDService WSDiscoveryTestCase ProbeMatchType address http://localhost:8080/jaxws-samples-wsdd2/WSDDService WSDiscoveryTestCase ProbeMatchType address http://localhost:8080/jaxws-samples-wsdd/WSDDService Version-Release number of selected component (if applicable): 6.2.0.ER1 6.2.0.ER2 6.2.0.ER3 How reproducible: intermittent Additional info: Test coveraget of implementation in upstream project should be inspected. WS Discovery feature was not approved for EAP-6.2.0, was imported from upstream
I did a quick evaluation of the 5 test runs containing WSDiscoveryTestCase test failures. Report data taken from this web site https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-62x-patched-jbossws-testsuite-matrix Runs #16,#15,#11 (i.e. "EAP 6.2.2.CP.CR3") and #12, #10 (i.e. "EAP 6.3.0.DR4") were evaluated 3 FAILURE PATTERNS 1. Of the 5 test reports evaluated "x86_64" platform configurations failed the most often. These platforms failed in 4 of the 5 test runs evaluated. 9 platforms X 5 test runs = 45 tests 9 platforms X 4 test run failures = 36 failed tests 36/45 = 80% failure rate jdk=ibm16, label=RHEL5 && x86_64 jdk=ibm17, label=RHEL5 && x86_64 jdk=ibm17, label=RHEL6 && x86_64 jdk=java16_default, label=RHEL6 && x86 jdk=java16_default, label=solaris10 && sparc jdk=java16_default, label=solaris11 && x86_64 jdk=java17_default, label=RHEL6 && x86_64 jdk=openjdk-1.6.0-local,label=RHEL6 && x86_64 jdk=openjdk-1.7.0-local,label=RHEL6 && x86_64 - The "6.2.2.CP.CR3 -fn" tests showed a 47% failure rate 65 platforms X 3 test runs = 195 total tests run 195 total tests run - 91 test failures = 104 passing tests 91/195 = 47% failure rate - The "EAP 6.3.0.DR4" tests showed a 25% failure rate 65 platforms X 2 test runs = 130 total tests run 130 total tests run - 33 test failures = 97 passing tests 33/130 = 25% failure rate 2. The most common failure causes was too many endpoint services by the same "targetname" found on the network. For test run "Mar 24 (#16) 6.2.2.CP.CR3 -fn" a total of 18 failures due to this. Here are examples of the junit failure stmt. expected:<1> but was:<4> expected:<1> but was:<3> expected:<1> but was:<2> expected:<1> but was:<0> For test run "Feb 20 (#10) EAP 6.3.0.DR4" a total of 17 failures due to this. There appears to have been a code change to the test for between "EAP 6.3.0.DR4" and "6.2.2.CP.CR3 -fn", but the behavior is still the same too many endpoint services found. expected:<3> but was:<7> expected:<3> but was:<4> expected:<3> but was:<6> expected:<3> but was:<5> 3. "Mar 24 (#16) 6.2.2.CP.CR3 -fn" is suffering a 2nd intermittent error. The following msg was generated for 15 tests. This appears to have been introduced into the code starting with build (#16). Could not resolve (timeout = 2000 ms) reference: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <EndpointReference xmlns="http://www.w3.org/2005/08/addressing"> <Address>urn:uuid:73645339-8571-4e3b-b3d3-8b77e4125d84</Address> <ReferenceParameters/></EndpointReference>
Failures are occurring because it appears multiple testsuits are being run in parallel on the same network. It's possible this is being run as "matrix job" on Jenkins, that would explain why the behavior is not reproducible outside Jenkins. I have not found any way to retrieve any unique identifying information about returned W3CEndpointReference objects. The only solution I see is to change the test to check that the array size of the matching list of endpoints is GT zero.
Ara you sure there is not fundamental mistake in the implementation ? We bind EAP in our tests to loopback interface only (127.0.0.1). Thus any WS announced have meaning only for the host where it is running, not for other hosts on network (as the address is 127.0.0.1 and they can not access it).
org.apache.cxf.ws.discovery.WSDiscoveryClient 397 disp.getRequestContext().put("udp.multi.response.timeout", timeout); The code above which the test is using, is making a UDP multicast call. ... "UDP is different than the other CXF transports in that it allows multiple responses to be received for a single request. For example, if you send out a request via a multicast or broadcast, several servers could respond to that request. ..."
Committed revision 18542 [BZ-1012454] check that the number of matches is GT 0.
The above bug fix does not address this error. Could not resolve (timeout = 2000 ms) reference: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <EndpointReference xmlns="http://www.w3.org/2005/08/addressing"> <Address>urn:uuid:73645339-8571-4e3b-b3d3-8b77e4125d84</Address> <ReferenceParameters/></EndpointReference> Every endpoint is assigned a unique uuid, by org.apache.cxf.ws.discovery.WSDiscoveryClient 438 builder.address(ContextUtils.generateUUID()); however the value is a private property of javax.xml.ws.wsaddressing.W3CEndpointReference and not accessible.
I found a means to print out the uuid for each endpoint. I've checked-in this tmp code in order to help debug Jenkins runs. Tmp code will be removed in the near future.
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-rhel/44/testReport http://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-rhel/45/testReport/ The evaluation of WSDiscoveryTestCase failures for run 44 and 45 on eap-6x-jbossws-testsuite-rhel shows that multiple machines are responding to the UDP multicast call by this code. Between the 2 runs there is some small overlap in test platform that shows the failure but it is not consistent enough to declare it is a platform specific issue. The bug report notes that this failure only occurs on jenkins and not when the test is run by individuals. Is this testsuit on jenkins being run in parallel on the same network?
Yes, as you can see in linked jenkins jobs, these are matrix jobs that could be executing many testsuites in parallel. And yes, those jenkins nodes are in the same network. Is that a problem for testing WSDiscovery? Can we do something about it? Is it possible to rewrite the test to handle such situation?
I hit the failure in single run with 6.3.1.CP.CR2 when there were no other WS testsuites running concurrently. The odd thing here is there were discovered three services with the same uuid. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-jbossws-testsuite-prepare/84/testReport/junit/org.jboss.test.ws.jaxws.samples.wsdd/WSDiscoveryTestCase/testProbeAndResolve/ Error Message http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 expected:<1> but was:<3> Stacktrace junit.framework.AssertionFailedError: http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 http://localhost:8080/jaxws-samples-wsdd/WSDDService urn:uuid:9ecf3c4d-eea6-4ec9-a067-4898edf8e0b8 expected:<1> but was:<3> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at org.jboss.test.ws.jaxws.samples.wsdd.WSDiscoveryTestCase.checkResolveMatches(WSDiscoveryTestCase.java:156) at org.jboss.test.ws.jaxws.samples.wsdd.WSDiscoveryTestCase.testProbeAndResolve(WSDiscoveryTestCase.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at org.jboss.wsf.test.JBossWSTestSetup$1.protect(JBossWSTestSetup.java:142) at junit.framework.TestResult.runProtected(TestResult.java:128) at org.jboss.wsf.test.JBossWSTestSetup.run(JBossWSTestSetup.java:149) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:234) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:133) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:188) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:166) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:101) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
There are actually more issues here: 1) in pure IPv6 environment no WS is found during PROBE phase and that is why org.jboss.test.ws.jaxws.samples.wsdd.WSDiscoveryTestCase.testProbeAndResolve always fails with: Error Message expected:<1> but was:<0> This might be possibly related to https://issues.jboss.org/browse/JBWS-3721 https://issues.jboss.org/browse/JBWS-3778 Note: the interesting thing here is that it applies only for RHELs, on windows pure IPv6 environments this test never failed from what I can see in jenkins: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-rhel-ipv6-pure/ https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-windows-ipv6-pure/ 2) the failures because of concurrent execution of testsuite on the same network: a) Although the each server is bind to localhost only, when WS-Discovery enabled service is deployed, it starts listening on UDP port 3702 even for remote (nonloopback interface) requests. b) This causes that also non-locally hosted web services are discovered in PROBE phase of #testProbeAndResolve and #testInvocation. c) All discovered WS are filtered by #filterProbeMatchesForHost but since all these webservices are deployed on localhost, it adds them all to further processing. d) The resolving phase then is the place where test most probably fails on timeout because it discovers some webservice hosted elsewere at previous step and it gets undeployed since then. e1) in #testInvocation there is hidden issue: because we get port by address from getXAddrs (which is always localhost), we actually execute webservice hosted on the current machine (even when the remote webservice is still active) e2) in #testProbeAndResolve it checks that each webservice is dicovered only once, which isn't always true because of step b) and c) 3) special case of failing of #testProbeAndResolve because it discovers three services with the same uuid (even when no other concurrent execution are running) I managed to isolate the issue to the specific (beaker) machine (there might be more of them), it fails every time on it. I haven't found any other indices yet. https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-smoke/178/testReport/junit/org.jboss.test.ws.jaxws.samples.wsdd/WSDiscoveryTestCase/testProbeAndResolve/ https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-smoke/179/testReport/junit/org.jboss.test.ws.jaxws.samples.wsdd/WSDiscoveryTestCase/testProbeAndResolve/ https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-WS/job/eap-6x-jbossws-testsuite-smoke/180/testReport/junit/org.jboss.test.ws.jaxws.samples.wsdd/WSDiscoveryTestCase/testProbeAndResolve/ My questions are: 1) Is responding on UDP multicast on public interface correct even when the server is accessible only on loopback interface? Or this is just special case and we should deal with it by configuring firewall? 2) Why is address with "localhost" sent back in ProbeMatch and ResolveMatch for remotely hosted service when it is obvious we cannot access it? XAddr specification: Transport address(es) that MAY be used to communicate with the Target Service (or Discovery Proxy). Contained URIs MUST NOT contain whitespaces. If a Target Service (or Discovery Proxy) has transport addresses (see Section 2.1 Endpoint References) at least one transport address MUST be included. If omitted or empty, no implied value. http://docs.oasis-open.org/ws-dd/discovery/1.1/os/wsdd-discovery-1.1-spec-os.html If both of above is expected then the problematic part for testing is the filtering 2c). Different approach might be needed. Or maybe this can be solved by binding server and testsuite to real IP address of the current machine (but I don't know whether it would break anythink else in the TS. Another attempt might be to use dynamic names (with some pseudorandom element) for webservices for each deployment.
Created attachment 985684 [details] patch files
Based upon an email discussion "Re: bz-1012454 next steps" 12/22/2014 I have modify our modules/addons/transports/udp/src/main/java/org/jboss/wsf/stack/cxf/addons/transports/udp/* files to use the same implementation as CXF. Here is the file list. u modules/addons/transports/udp/pom.xml u modules/addons/transports/udp/src/main/java/org/jboss/wsf/stack/cxf/addons/transports/udp/UDPConduit.java u modules/addons/transports/udp/src/main/java/org/jboss/wsf/stack/cxf/addons/transports/udp/UDPDestination.java a modules/addons/transports/udp/src/main/java/org/jboss/wsf/stack/cxf/addons/transports/udp/IoSessionInputStream.java a modules/addons/transports/udp/src/main/java/org/jboss/wsf/stack/cxf/addons/transports/udp/IoSessionOutputStream.java This code change required the addition of archive <groupId>org.apache.mina</groupId> <artifactId>mina-core</artifactId> to the pom.xml and this archive must be added as a module to the JBoss server and a reference to the package added to module modules/system/layers/base/org/jboss/ws/cxf/jbossws-cxf-transports-udp/main/ I tested these code changes in a single machine env in JBossWS CXF stack (4.3.2.Final), the version used by EAP6.4.0, and JBossWS CXF stack (5.0.0-SNAPSHOT). I have attached patch and zip files for both versions (bz1012454.zip).
Created attachment 986852 [details] Patch for wFly900-cxf_500-SNAPSHOT
Created attachment 986853 [details] Patch for eap640 cxf_432-final.patch
Regarding the comment #12, the first issue of not working in IPV6 network was fixed in https://issues.apache.org/jira/browse/CXF-6172. The other two issues are still relevant.