1912916 – Set external traffic policy to cluster for IBM platform

Bug 1912916 - Set external traffic policy to cluster for IBM platform

Summary: Set external traffic policy to cluster for IBM platform

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Miciah Dashiel Butler Masters
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-05 15:29 UTC by Rudi Braun
Modified:	2022-08-04 22:30 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:50:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 516	0	None	closed	Bug 1912916: Set traffic policy to cluster for IBM platform	2021-01-20 00:16:16 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:50:25 UTC

Description Rudi Braun 2021-01-05 15:29:16 UTC

Description of problem:

Looks like we've encountered some regression in 4.6 around external traffic policy being set to "local". Any manipulation, it seems the operator is just swapping to back to "local." In previous versions, if this value was set to "cluster" it would stick, without intervention from the operator. 

In IBM's IKS the loadbalancer impl is created within the cluster, LB places a VIP on one of the worker nodes, using keepalived to maintain the VIP and ensure redundancy. This LB depends on iptable rules kube-proxy puts in to send traffic from the vip to the cluster. 

With a policy of local, the traffic is only sent to pods on the local node - specifically setting back to Cluster (for IBM plat) will enable the traffic to flow to all pods in the cluster.  

Version-Release number of selected component (if applicable):
seems to be some time during 4.6

How reproducible:
easy to reproduce

Steps to Reproduce:
1. create an IBM IKS openshift 4.6 cluster
2. check external traffic policy on LB after provisioning
3. traffic policy will be set to local

Actual results:
traffic policy is set to local after cluster provisioning, any subsequent manipulations get overwritten by the operator

Expected results:
traffic policy is set to cluster on IBM plat after cluster provisioning, any subsequent manipulations will be honored. 

Additional info:

Comment 1 Miciah Dashiel Butler Masters 2021-01-05 15:40:30 UTC

(In reply to Rudi Braun from comment #0)
> Looks like we've encountered some regression in 4.6 around external traffic
> policy being set to "local". Any manipulation, it seems the operator is just
> swapping to back to "local." In previous versions, if this value was set to
> "cluster" it would stick, without intervention from the operator. 

This may have been caused by https://github.com/openshift/cluster-ingress-operator/pull/482, which was reverted in https://github.com/openshift/cluster-ingress-operator/pull/507 to fix bug 1905490.  #482 shipped in 4.6.6, and #507 shipped in 4.6.9.  On what specific version are you seeing the issue?  

I agree though that if IBM Cloud needs "Cluster" external traffic policy, then the operator should set that (as per <https://github.com/openshift/cluster-ingress-operator/pull/516>).  Nothing but the operator should be modifying the service that the operator manages.

Comment 2 Rudi Braun 2021-01-05 15:48:22 UTC

We were testing against 4.6.6 when observing the issue, have not tried against a 4.6.9+ build.

Comment 4 Miciah Dashiel Butler Masters 2021-01-05 18:39:41 UTC

We'll try to get https://github.com/openshift/cluster-ingress-operator/pull/516 merged in time for the OCP 4.7.0 release so that the operator sets the "Cluster" external traffic policy on IBM Cloud.  

I gather that you are currently using some workaround to set the external traffic policy.  Do you want https://github.com/openshift/cluster-ingress-operator/pull/516 to be backported to 4.6.z in order to obviate the need for the workaround?  (A backport will require some manual conflict resolution, but I do not mind doing it.)

Comment 6 Rudi Braun 2021-01-06 18:54:49 UTC

We've given 4.6.9 a shot per the comment above about the revert, and it does look like we're seeing the original behavior pre-4.6.6. I think if at some point you'd like to reintroduce that suspected change in 4.6, it would make sense to backport - however I differ to your guys' best judgement. As things stand, we appear to be working ok in 4.6.9.

Comment 7 Cesar Wong 2021-01-15 23:15:58 UTC

Verified with 4.7.0-0.nightly-2021-01-15-194305

Comment 8 Hongan Li 2021-01-18 01:11:07 UTC

no regression on other Cloud platform and moving to verified

Comment 11 errata-xmlrpc 2021-02-24 15:50:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.