Bug 2035193

Summary:	inconsistent behavior on AWS/Azure/GCP after updating ingresscontroller LB scope
Product:	OpenShift Container Platform	Reporter:	Hongan Li <hongli>
Component:	Networking	Assignee:	aos-network-edge-staff <aos-network-edge-staff>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aos-bugs, mmasters, travi, wking
Version:	4.10
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-12-23 18:06:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hongan Li 2021-12-23 08:56:27 UTC

Description of problem:
After updating ingresscontroller LB scope, we observed different behaviour on AWS, Azure and GCP.
On AWS, it asked admin to delete the svc/router-default manually then will create a new one.
On Azure/GCP, the operator updated the service immediately.

OpenShift release version:
4.10.0-0.nightly-2021-12-21-130047

Cluster Platform:
AWS/Azure/GCP

How reproducible:
100%

Steps to Reproduce (in detail):
1. update the ingresscontroller's scope
$ oc -n openshift-ingress-operator edit ingresscontrollers/default

Actual results:

on AWS:
$ oc get co/ingress
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
ingress 4.10.0-0.nightly-2021-12-21-130047 True True False 6h7m ingresscontroller "default" is progressing: ScopeChanged: The IngressController scope was changed from "External" to "Internal". To effectuate this change, you must delete the service: `oc -n openshift-ingress delete svc/router-default`; the service load-balancer will then be deprovisioned and a new one created. This will most likely cause the new load-balancer to have a different host name and IP address from the old one's. Alternatively, you can revert the change to the IngressController: `oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"External"}}}}'.

on Azure/GCP: no above message

Expected results:
better to keep same behavior for these cloud provider.

Impact of the problem:
user experience

Additional info:

** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.

Comment 1 Miciah Dashiel Butler Masters 2021-12-23 18:06:46 UTC

This is expected behavior.  The goal was to give the best user experience we reasonably could for each platform, and unfortunately, the different platforms have different capabilities: 

> Some platforms (such as Azure and GCP) support changing the scope of a service load-balancer between internal and external without deleting and recreating the load balancer, by setting a cloud-provider-specific annotation on the Kubernetes Service object. On these platforms, the operator merely sets the annotation to the desired scope, and Kubernetes's service controller and cloud-provider implementation complete the operation of changing the load balancer's scope.
> 
> Other platforms (such as AWS) require deleting and recreating a load balancer to change its scope. This operation is disruptive: It interrupts ingress traffic and may cause the load balancer's address to change. On these platforms, the operator signals that the user must delete the Kubernetes Service object. Once the user performs this step, the operator recreates the load balancer with the desired scope to complete the operation.

https://github.com/openshift/enhancements/blob/master/enhancements/ingress/mutable-publishing-scope.md#proposal

To make the experience consistent across platforms, we would need to degrade the experience on platforms such as Azure and GCP that support changing the scope without deleting and recreating the load balancer.  

I'm going to close this report as NOTABUG (although it is unfortunate that the various cloud platforms have this inconsistency).  Please re-open if I have misunderstood and missed an opportunity for improving the overall user experience.