Bug 1763936 - [sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]
Summary: [sig-network] Networking Granular Checks: Services should function for endpoi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Jeremiah Stuever
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1763940 1763941 1767946 (view as bug list)
Depends On:
Blocks: 1779469 1784594 1798176
TreeView+ depends on / blocked
 
Reported: 2019-10-22 02:17 UTC by Wei Duan
Modified: 2020-05-04 11:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: AWS IPI was missing security group rules allowing traffic from control plane hosts to workers on TCP and UDP ports 30000 - 32767. Similarly, AWS UPI and GCP UPI jobs did not have the proper network ACLs applied in all situations, this was limited in scope to the CI jobs, so no need to include this in release notes, I'm just explaining why there's three PRs attached to this bug. Consequence: Newly introduced OVN Networking components would not work properly in clusters lacking these security group rules. Fix: For existing clusters, add security group rules allowing control plane to workers on TCP and UDP ports 30000 - 32767. Result: OVN Networking components will work properly.
Clone Of:
: 1779469 1784594 1798176 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:14:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 2983 None closed Bug 1763936: aws: enable node ports between control plane and compute 2020-09-15 02:11:46 UTC
Github openshift installer pull 2984 None closed Bug 1763936: aws upi: enable udp ports 9000-9999 and 30000-32767 2020-09-15 02:11:46 UTC
Github openshift installer pull 2985 None closed Bug 1763936: gcp: enable node ports between control plane and compute 2020-09-15 02:11:46 UTC
Github openshift installer pull 3027 None closed Bug 1763936: OpenStack: enable node ports between control plane and compute 2020-09-15 02:11:45 UTC
Github openshift origin pull 23896 'None' closed OVN-Kubernetes: some more test tweaks 2020-09-15 02:11:45 UTC
Github openshift origin pull 24185 'None' closed Bug 1763936: Disable node-Service and pod-Service granular checks 2020-09-15 02:11:46 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:15:17 UTC

Internal Links: 1846875

Description Wei Duan 2019-10-22 02:17:38 UTC
Description of problem:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/139


[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes] 

fail [k8s.io/kubernetes/test/e2e/framework/networking_utils.go:250]: Oct 21 21:49:55.346: Failed to find expected endpoints:
Tries 66
Command curl -g -q -s 'http://10.129.0.253:8080/dial?request=hostName&protocol=udp&host=10.0.59.21&port=32196&tries=1'
retrieved map[]
expected map[netserver-0:{} netserver-1:{} netserver-2:{} netserver-3:{} netserver-4:{} netserver-5:{}]

Comment 2 Casey Callendrello 2019-10-22 12:17:27 UTC
Yup, this is a known bug. We've already filed a fix skipping this test, and we have a known issue. It may not be fixed for 4.3, but we'll try.

Comment 3 Casey Callendrello 2019-10-22 12:22:34 UTC
*** Bug 1763941 has been marked as a duplicate of this bug. ***

Comment 6 W. Trevor King 2019-11-22 00:30:42 UTC
I added a link to the skipping PR, but it missed things like we're seeing in 4.3's aws-upi job today [1,2]:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/373/build-log.txt | grep '^failed: '
failed: (5m38s) 2019-11-21T23:24:14 "[sig-network] Networking Granular Checks: Services should function for node-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]"
failed: (4m43s) 2019-11-21T23:24:31 "[sig-network] Networking Granular Checks: Services should function for node-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]"
failed: (9m15s) 2019-11-21T23:38:14 "[sig-network] Networking Granular Checks: Services should function for pod-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]"
failed: (9m21s) 2019-11-21T23:42:57 "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]"

Skips have:

  \[sig-network\] Networking Granular Checks: Services should function for endpoint-Service`,

but are lacking 'node-Service: udp', 'node-Service: http', and 'pod-service: udp' variants.  Failure counts by job for each of those over the past 24h:

$ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+node-Service:+udp' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n
      1 pull-ci-openshift-origin-master-e2e-aws-fips
      1 rehearse-5308-pull-ci-openshift-installer-master-e2e-aws-proxy
      1 release-openshift-ocp-installer-e2e-gcp-4.3
      1 release-openshift-ocp-installer-e2e-openstack-4.3
      1 release-openshift-ocp-installer-e2e-openstack-4.4
      3 release-openshift-ocp-installer-e2e-aws-proxy-4.3
      4 release-openshift-ocp-installer-e2e-aws-upi-4.3
      4 release-openshift-ocp-installer-e2e-aws-upi-4.4

$ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+node-Service:+http' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n
      1 endurance-e2e-aws-4.3
      1 pull-ci-openshift-installer-master-e2e-aws-fips
      1 pull-ci-openshift-machine-config-operator-master-e2e-aws
      1 release-openshift-ocp-installer-e2e-aws-upi-4.3
      1 release-openshift-ocp-installer-e2e-openstack-4.4
      1 release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.3
      1 release-openshift-origin-installer-e2e-gcp-compact-4.3

$ curl -s 'https://search.svc.ci.openshift.org/search?maxAge=24h&name=.&search=failed:+.*Networking+Granular+Checks:+Services+should+function+for+pod-Service:+udp' | jq -r '. | keys[]' | sed 's|.*/\([^/]*\)/[0-9]*|\1|' | sort | uniq -c | sort -n
      1 rehearse-5308-pull-ci-openshift-installer-master-e2e-aws-proxy
      3 release-openshift-ocp-installer-e2e-aws-proxy-4.3
      4 release-openshift-ocp-installer-e2e-aws-upi-4.4
      5 release-openshift-ocp-installer-e2e-aws-upi-4.3

[1]: https://search.svc.ci.openshift.org/chart?name=release-openshift-ocp-installer-e2e-aws-upi-4.3&search=failed:%20.*Networking%20Granular%20Checks:%20Services%20should%20function%20for%20.*-Service
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.3/373

Comment 7 W. Trevor King 2019-11-23 05:44:17 UTC
*** Bug 1763940 has been marked as a duplicate of this bug. ***

Comment 8 W. Trevor King 2019-11-23 05:45:46 UTC
*** Bug 1767946 has been marked as a duplicate of this bug. ***

Comment 9 W. Trevor King 2019-11-23 05:46:59 UTC
Ricardo was looking into this over in [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1767946#c2

Comment 11 Ricardo Carrillo Cruz 2019-12-10 17:53:48 UTC
I'm afraid I'm unable to repro this.

I ran 5-6 times in a row the endpoint-Service udp test, they all passed:

<snip>
STEP: Performing setup for networking test in namespace e2e-nettest-6329
STEP: creating a selector
STEP: Creating the service pods in kubernetes
Dec 10 18:49:56.700: INFO: Waiting up to 10m0s for all (but 100) nodes to be schedulable
STEP: Creating test pods
STEP: Getting node addresses
Dec 10 18:50:36.925: INFO: Waiting up to 10m0s for all (but 100) nodes to be schedulable
STEP: Creating the service on top of the pods in kubernetes
Dec 10 18:50:38.272: INFO: Service node-port-service in namespace e2e-nettest-6329 found.
Dec 10 18:50:38.762: INFO: Service session-affinity-service in namespace e2e-nettest-6329 found.
STEP: dialing(udp) netserver-0 (endpoint) --> 172.30.228.56:90 (config.clusterIP)
Dec 10 18:50:39.062: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:40.522: INFO: Waiting for endpoints: map[netserver-0:{} netserver-2:{}]
Dec 10 18:50:42.673: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:44.938: INFO: Waiting for endpoints: map[netserver-0:{} netserver-2:{}]
Dec 10 18:50:47.159: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:48.345: INFO: Waiting for endpoints: map[netserver-0:{}]
Dec 10 18:50:50.538: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:51.973: INFO: Waiting for endpoints: map[netserver-0:{}]
Dec 10 18:50:54.300: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:55.697: INFO: Waiting for endpoints: map[netserver-0:{}]
Dec 10 18:50:57.848: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=172.30.228.56&port=90&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:50:59.122: INFO: Waiting for endpoints: map[]
STEP: dialing(udp) netserver-0 (endpoint) --> 10.0.130.166:32012 (nodeIP)
Dec 10 18:50:59.273: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:51:00.567: INFO: Waiting for endpoints: map[netserver-1:{} netserver-2:{}]
Dec 10 18:51:02.826: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:51:04.130: INFO: Waiting for endpoints: map[netserver-2:{}]
Dec 10 18:51:06.281: INFO: ExecWithOptions {Command:[/bin/sh -c curl -g -q -s 'http://10.128.2.207:8080/dial?request=hostName&protocol=udp&host=10.0.130.166&port=32012&tries=1'] Namespace:e2e-nettest-6329 PodName:host-test-container-pod ContainerName:agnhost Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false}
Dec 10 18:51:07.532: INFO: Waiting for endpoints: map[]
[AfterEach] [sig-network] Networking
  /home/ricky/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152
Dec 10 18:51:07.532: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-nettest-6329" for this suite.
Dec 10 18:51:08.146: INFO: Running AfterSuite actions on all nodes
Dec 10 18:51:08.146: INFO: Running AfterSuite actions on node 1

passed: (1m22s) 2019-12-10T17:51:08 "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]"

1 pass, 0 skip (1m22s)
</snip>

Comment 16 Ricardo Carrillo Cruz 2019-12-20 09:38:56 UTC
I'm unable to reproduce, this must be an environmental issue on CI.
I ran on a 4.4 cluster the test 100 times:

[ricky@ricky-laptop origin]$ for i in {1..100}; do _output/local/bin/linux/amd64/openshift-tests run-test "[sig-network] Networking Granular Checks: Services should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s] [Skipped:Network/OVNKubernetes]" ; echo $? >> /tmp/test.out ; done

[ricky@ricky-laptop origin]$ grep -c 0 /tmp/test.out
100

It succeeds 100%.

Comment 17 zhaozhanqi 2019-12-24 03:18:55 UTC
I checked same build with 4.3.0-0.nightly-2019-12-23-235118

aws [1] works well.  but on openstack [2] still failed

[1]https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/554
[2]https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/658

Comment 18 zhaozhanqi 2019-12-24 03:25:39 UTC
and I also checked gcp and vsphere are working well. So it should be openstack issue I guess.

Comment 19 zhaozhanqi 2020-01-06 09:21:49 UTC
Verified this bug according to comment 17 and 18

Comment 20 Scott Dodson 2020-01-27 20:26:54 UTC
The fix for this bug was originally to disable tests, however those tests were identifying valid problems. Moving this over to the Installer component and we'll track fixes to the AWS IPI, AWS UPI, and GCP UPI CI jobs that should resolve these.

Comment 22 Scott Dodson 2020-01-30 17:22:25 UTC
Additional changes necessary.

Comment 29 Scott Dodson 2020-02-04 18:12:46 UTC
Lets split a separate bug out for that, we really probably should've done one bug per provider from the start but ovirt comes from another team. I'll clone and assign, also since oVirt is new to 4.4 there's no need to backport beyond that.

Comment 30 zhaozhanqi 2020-02-05 07:16:20 UTC
move this bug to verified. since another bug 1798176 is tracing issue comment 28

Comment 33 errata-xmlrpc 2020-05-04 11:14:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.