Description of problem: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/139 [sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes] fail [k8s.io/kubernetes/test/e2e/framework/networking_utils.go:250]: Oct 21 21:49:55.346: Failed to find expected endpoints: Tries 66 Command curl -g -q -s 'http://10.129.0.253:8080/dial?request=hostName&protocol=udp&host=10.0.59.21&port=32196&tries=1' retrieved map[] expected map[netserver-0:{} netserver-1:{} netserver-2:{} netserver-3:{} netserver-4:{} netserver-5:{}]
Related job https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-ocp-installer-e2e-aws-upi-4.3
Yup, this is a known bug. We've already filed a fix skipping this test, and we have a known issue. It may not be fixed for 4.3, but we'll try.
*** Bug 1763941 has been marked as a duplicate of this bug. ***
I added a link to the skipping PR, but it missed things like we're seeing in 4.3's aws-upi job today [1,2]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/373/build-log.txt | grep '^failed: ' failed: (5m38s) 2019-11-21T23:24:14 "[sig-network] Networking Granular Checks: Services should function for node-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" failed: (4m43s) 2019-11-21T23:24:31 "[sig-network] Networking Granular Checks: Services should function for node-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]" failed: (9m15s) 2019-11-21T23:38:14 "[sig-network] Networking Granular Checks: Services should function for pod-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" failed: (9m21s) 2019-11-21T23:42:57 "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]" Skips have: \[sig-network\] Networking Granular Checks: Services should function for endpoint-Service`, but are lacking 'node-Service: udp', 'node-Service: http', and 'pod-service: udp' variants. Failure counts by job for each of those over the past 24h: $ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+node-Service:+udp' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n 1 pull-ci-openshift-origin-master-e2e-aws-fips 1 rehearse-5308-pull-ci-openshift-installer-master-e2e-aws-proxy 1 release-openshift-ocp-installer-e2e-gcp-4.3 1 release-openshift-ocp-installer-e2e-openstack-4.3 1 release-openshift-ocp-installer-e2e-openstack-4.4 3 release-openshift-ocp-installer-e2e-aws-proxy-4.3 4 release-openshift-ocp-installer-e2e-aws-upi-4.3 4 release-openshift-ocp-installer-e2e-aws-upi-4.4 $ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+node-Service:+http' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n 1 endurance-e2e-aws-4.3 1 pull-ci-openshift-installer-master-e2e-aws-fips 1 pull-ci-openshift-machine-config-operator-master-e2e-aws 1 release-openshift-ocp-installer-e2e-aws-upi-4.3 1 release-openshift-ocp-installer-e2e-openstack-4.4 1 release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.3 1 release-openshift-origin-installer-e2e-gcp-compact-4.3 $ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+pod-Service:+udp' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n 1 rehearse-5308-pull-ci-openshift-installer-master-e2e-aws-proxy 3 release-openshift-ocp-installer-e2e-aws-proxy-4.3 4 release-openshift-ocp-installer-e2e-aws-upi-4.4 5 release-openshift-ocp-installer-e2e-aws-upi-4.3 [1]: https://search.svc.ci.openshift.org/chart?name=release-openshift-ocp-installer-e2e-aws-upi-4.3&search=failed:%20.*Networking%20Granular%20Checks:%20Services%20should%20function%20for%20.*-Service [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/373
*** Bug 1763940 has been marked as a duplicate of this bug. ***
*** Bug 1767946 has been marked as a duplicate of this bug. ***
Ricardo was looking into this over in [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1767946#c2
This is currently causing ~10% of all ^release-.*4.3$ errors over the past 12h [1,2,3,4,5,6,7,8]. Bumping the severity/priority. [1]: https://search.svc.ci.openshift.org/chart?maxAge=12h&name=^release-.*4.3$&search=failed:.*Networking%20Granular%20Checks:%20Services%20should%20function%20for [2]: https://search.svc.ci.openshift.org/?name=^release.*4.3$&search=failed%3A.*Networking+Granular+Checks%3A+Services+should+function+for&maxAge=12h&context=0&type=all [3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-compact-4.3/32 [4]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/381 [5]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-compact-4.3/40 [6]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/382 [7]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-proxy-4.3/226 [8]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/383
I'm afraid I'm unable to repro this. I ran 5-6 times in a row the endpoint-Service udp test, they all passed: <snip> STEP: Performing setup for networking test in namespace e2e-nettest-6329 STEP: creating a selector STEP: Creating the service pods in kubernetes Dec 10 18:49:56.700: INFO: Waiting up to 10m0s for all (but 100) nodes to be schedulable STEP: Creating test pods STEP: Getting node addresses Dec 10 18:50:36.925: INFO: Waiting up to 10m0s for all (but 100) nodes to be schedulable STEP: Creating the service on top of the pods in kubernetes Dec 10 18:50:38.272: INFO: Service node-port-service in namespace e2e-nettest-6329 found. Dec 10 18:50:38.762: INFO: Service session-affinity-service in namespace e2e-nettest-6329 found. STEP: dialing(udp) netserver-0 (endpoint) --> 172.30.228.56:90 (config.clusterIP) Dec 10 18:50:39.062: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:40.522: INFO: Waiting for endpoints: map[netserver-0:{} netserver-2:{}] Dec 10 18:50:42.673: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:44.938: INFO: Waiting for endpoints: map[netserver-0:{} netserver-2:{}] Dec 10 18:50:47.159: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:48.345: INFO: Waiting for endpoints: map[netserver-0:{}] Dec 10 18:50:50.538: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:51.973: INFO: Waiting for endpoints: map[netserver-0:{}] Dec 10 18:50:54.300: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:55.697: INFO: Waiting for endpoints: map[netserver-0:{}] Dec 10 18:50:57.848: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:50:59.122: INFO: Waiting for endpoints: map[] STEP: dialing(udp) netserver-0 (endpoint) --> 10.0.130.166:32012 (nodeIP) Dec 10 18:50:59.273: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:51:00.567: INFO: Waiting for endpoints: map[netserver-1:{} netserver-2:{}] Dec 10 18:51:02.826: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:51:04.130: INFO: Waiting for endpoints: map[netserver-2:{}] Dec 10 18:51:06.281: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false} Dec 10 18:51:07.532: INFO: Waiting for endpoints: map[] [AfterEach] [sig-network] Networking /home/ricky/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152 Dec 10 18:51:07.532: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-nettest-6329" for this suite. Dec 10 18:51:08.146: INFO: Running AfterSuite actions on all nodes Dec 10 18:51:08.146: INFO: Running AfterSuite actions on node 1 passed: (1m22s) 2019-12-10T17:51:08 "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]" 1 pass, 0 skip (1m22s) </snip>
I'm unable to reproduce, this must be an environmental issue on CI. I ran on a 4.4 cluster the test 100 times: [ricky@ricky-laptop origin]$ for i in {1..100}; do _output/local/bin/linux/amd64/openshift-tests run-test "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]" ; echo $? >> /tmp/test.out ; done [ricky@ricky-laptop origin]$ grep -c 0 /tmp/test.out 100 It succeeds 100%.
I checked same build with 4.3.0-0.nightly-2019-12-23-235118 aws [1] works well. but on openstack [2] still failed [1]https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/554 [2]https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/658
and I also checked gcp and vsphere are working well. So it should be openstack issue I guess.
Verified this bug according to comment 17 and 18
The fix for this bug was originally to disable tests, however those tests were identifying valid problems. Moving this over to the Installer component and we'll track fixes to the AWS IPI, AWS UPI, and GCP UPI CI jobs that should resolve these.
Additional changes necessary.
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/762
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-proxy-4.3/436
met same issue in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.4/297
Lets split a separate bug out for that, we really probably should've done one bug per provider from the start but ovirt comes from another team. I'll clone and assign, also since oVirt is new to 4.4 there's no need to backport beyond that.
move this bug to verified. since another bug 1798176 is tracing issue comment 28
Hit the same issue in the gcp-compact CI. https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-compact-4.3/115
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581