Bug 1993845

Summary: Enabling internalTrafficPolicy=Local found two issues in test cases
Product: OpenShift Container Platform Reporter: Martin Kennelly <mkennell>
Component: NetworkingAssignee: Martin Kennelly <mkennell>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED UPSTREAM Docs Contact:
Severity: high    
Priority: medium CC: danw
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-31 08:14:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kennelly 2021-08-16 09:19:15 UTC
Description of problem:
When re-enabling tests after k8 rebase to 1.22.0 (https://bugzilla.redhat.com/show_bug.cgi?id=1986307):
[sig-network] Services should respect internalTrafficPolicy=Local Pod to Pod (hostNetwork: true) [Feature:ServiceInternalTrafficPolicy] [Suite:openshift/conformance/parallel] [Suite:k8s] 
[sig-network] Services should respect internalTrafficPolicy=Local Pod (hostNetwork: true) to Pod (hostNetwork: true) [Feature:ServiceInternalTrafficPolicy] [Suite:openshift/conformance/parallel] [Suite:k8s] 

I encountered two problems.

1) Pod has insufficient privileges to bind to hostport 80.
~ $ /agnhost netexec --http-port 80
~ $ /agnhost netexec --http-port 80
2021/08/13 15:19:55 Started HTTP server on port 80
2021/08/13 15:19:55 Started UDP server on port  8081
2021/08/13 15:19:55 listen tcp :80: bind: permission denied



2) Comparison of FQDN and hostname fails
See test/e2e/network/service.go +2259 & +2341 - Calling execHostnameTest with node0.Name (FQDN) and then comparing with agnhost /hostname (hostname) ( https://pkg.go.dev/k8s.io/kubernetes@v1.18.0-alpha.0/test/images/agnhost?readme=expanded#readme-serve-hostname) will fail on OCP.


Version-Release number of selected component (if applicable):
K8 1.22.0


How reproducible:
Build openshift-tests with k8 1.22 test cases (see PR on origin rebase-1.22.0-rc.0]).
Test against nightly of ocp 4.9.



I have produced two fixes that enable the test cases to pass for upstream k8:

Issue 1: https://github.com/martinkennelly/kubernetes/tree/fix_local_test_bind_denied

Issue 2: https://github.com/martinkennelly/kubernetes/tree/fix_fqdn_hostname_mismatch

Comment 1 Martin Kennelly 2021-08-16 09:27:42 UTC
For issue 1 - Either we increase pod privileges or up the port number above 1024. I went for the latter.
I will disable the two test cases until upstream is resolved.

Comment 2 Dan Winship 2021-08-16 12:21:13 UTC
fixes look good, though I'd add a comment to the code in the second one rather than only explaining in the commit message

Can you push those PRs upstream and the link to the PRs from here so I'll see them?

Then once it merges upstream you'll need to cherry-pick them into https://github.com/openshift/kubernetes, as explained in the README.openshift.md there

Comment 3 Dan Winship 2021-08-16 12:22:52 UTC
(Though cherry-picking them is only relevant if we're actually planning to enable the alpha feature gate in 4.9, which I guess we probably aren't, so probably you don't actually have to do that.)

Comment 4 Martin Kennelly 2021-08-17 08:46:38 UTC
Comment added to code.

PRs:

1) Pod has insufficient privileges to bind to hostport 80.
https://github.com/kubernetes/kubernetes/pull/104409


2) Comparison of FQDN and hostname fails
https://github.com/kubernetes/kubernetes/pull/104408

Yes, but I may as well do this when it's merged so we have it done for the future.

Comment 5 Dan Winship 2021-08-17 13:30:35 UTC
(In reply to Martin Kennelly from comment #4)
> Yes, but I may as well do this when it's merged so we have it done for the
> future.

If we don't need the fix until OCP 4.10 then it doesn't have to be cherry-picked, because it will get pulled in as part of the rebase to kube 1.23.

Comment 6 Martin Kennelly 2021-08-26 10:30:54 UTC
Dan, isn't OCP 4.9 based on k8 1.22 and therefore this feature is in beta? https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
Therefore we need to cherry-pick back the fixes.

Comment 7 Martin Kennelly 2021-08-26 12:04:46 UTC
Missing from this BZ was test case: "[sig-network] Services should respect internalTrafficPolicy=Local Pod (hostNetwork: true) to Pod [Feature:ServiceInternalTrafficPolicy]"

This was also disabled due to upstream fix here: https://github.com/kubernetes/kubernetes/pull/104409/

Comment 8 Dan Winship 2021-08-26 13:22:49 UTC
ah, kube_features.go claims it's still alpha in the comment:

	// owner: @maplain @andrewsykim
	// kep: http://kep.k8s.io/2086
	// alpha: v1.21
	//
	// Enables node-local routing for Service internal traffic
	ServiceInternalTrafficPolicy featuregate.Feature = "ServiceInternalTrafficPolicy"

but sets it to beta in defaultKubernetesFeatureGates:

	ServiceInternalTrafficPolicy:                   {Default: true, PreRelease: featuregate.Beta},

so it looks like they forgot to update the comment.

So yes, it would be good to cherry-pick the fixes. (And maybe also fix the comment upstream to indicate its status correctly.)