Bug 1798176

Summary: [sig-network] Networking Granular Checks: failing for several platforms
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: gpei, hekumar, hongkliu, rgolan, wduan, wking, zzhao
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1763936 Environment:
Last Closed: 2020-03-03 21:00:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1763936, 1794714, 1808856, 1822714, 1822766, 1834278    
Bug Blocks: 1784594    

Comment 1 Scott Dodson 2020-02-27 00:47:54 UTC
Roy,

This is most likely a case where the required network ACLs aren't in place. We fixed this in GCP UPI here https://github.com/openshift/installer/pull/2984

udp ports 9000-9999 and 30000-32767 need to be open between control plane and nodes in both directions

Comment 2 Roy Golan 2020-02-27 08:33:44 UTC
We don't block any of these at all, we have layer 2 networking and no libvirt network filters set by default.

Gal please share you're findings so far on the bug.

Comment 3 Gal Zaidman 2020-03-01 16:47:13 UTC
I tried to debug the issue and I found that it is a SDN bug, I opened:
https://bugzilla.redhat.com/show_bug.cgi?id=1808856

I also found the the problem doesn't occur when we use Network Type: OVNKubernetes.
After talking with network engineer I understood that we are going towards OVNKubernetes as the default network in the future.
So I have created this PR to switch to OVNKubernetes on our e2e Jobs and it will fix the NodePort test fails:
https://github.com/openshift/release/pull/7392

Comment 4 Roy Golan 2020-03-03 08:41:10 UTC
Scott have a look at this search - a lot of other platforms are failing on this and other related network tests, constantly https://search.svc.ci.openshift.org/?search=failed%3A+.*+%22%5C%5Bsig-network%5C%5D+.*Networking+Granular+Checks%3A+&maxAge=168h&context=0&type=all

Comment 5 Gal Zaidman 2020-03-03 09:25:33 UTC
Scott please notice that there is a open bug on the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Comment 6 Scott Dodson 2020-03-03 21:00:29 UTC
Closing as a dupe given your confirmation that there are no network ACLs restricting connectivity between masters and workers.

*** This bug has been marked as a duplicate of bug 1794714 ***