2056928 – Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation

Bug 2056928 - Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation

Summary: Ingresscontroller LB scope change behaviour differs for different values of a...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Miciah Dashiel Butler Masters
QA Contact:	Arvind iyengar
Docs Contact:
URL:
Whiteboard:
Depends On:	2055470
Blocks:	2057518
TreeView+	depends on / blocked

Reported:	2022-02-22 11:39 UTC by Ravi Trivedi
Modified:	2022-08-04 22:35 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The AWS cloud-provider implementation checks the "service.beta.kubernetes.io/aws-load-balancer-internal" service annotation to determine whether a service load-balancer (SLB) should be configured to be internal (as opposed to being public). The cloud-provider implementation recognizes both the value "0.0.0.0/0" and the value "true" as indicating that an SLB should be internal. The ingress operator in OpenShift 4.7 and earlier sets the value "0.0.0.0/0", and the ingress operator in OpenShift 4.8 and later sets the value "true" for services that the operator creates for internal SLBs. A service that was created on an older cluster might have the annotation value "0.0.0.0/0", which could cause comparisons that check for the "true" value to return the wrong result. Consequence: When a cluster had an internal SLB that had been configured using the old annotation value and the cluster was upgraded to OpenShift 4.10, the ingress operator would report the Progressing=True clusteroperator status condition, preventing the upgrade from completing. Fix: Logic was added to the ingress operator to normalize the service.beta.kubernetes.io/aws-load-balancer-internal service annotation for operator-managed services by replacing the value "0.0.0.0/0" with the value "true". Result: The ingress operator no longer prevents upgrades of clusters with the "service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0" annotation from completing.
Clone Of:	2055470
Environment:
Last Closed:	2022-03-10 16:44:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 705	0	None	Merged	[release-4.10] Bug 2056928: Normalize the AWS internal LB annotation value	2022-02-23 15:25:25 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:44:39 UTC

Comment 2 Hongan Li 2022-02-23 05:07:13 UTC

tested with the cluster that launched by cluster-bot (launch openshift/cluster-ingress-operator#705 aws) and the PR works as expected.

after changing scope to internal and manually changing annotation value to "0.0.0.0/0", ingress-operator update the annotation to "true" immediately.

$ oc -n openshift-ingress annotate svc/router-default service.beta.kubernetes.io/aws-load-balancer-internal="0.0.0.0/0" --overwrite
service/router-default annotated

$ oc -n openshift-ingress get svc/router-default -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""


### logs of ingress-operator

2022-02-23T04:57:26.536Z	INFO	operator.ingress_controller	ingress/load_balancer_service.go:294	normalized annotation	{"namespace": "openshift-ingress", "name": "router-default", "annotation": "service.beta.kubernetes.io/aws-load-balancer-internal", "old": "0.0.0.0/0", "new": "true"}


$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest   True        False         20m     Cluster version is 4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest

Comment 4 Mike Fiedler 2022-02-23 19:17:55 UTC

Moving to MODIFIED.  No nightly 4.10 includes the fix for this.

Comment 8 Arvind iyengar 2022-02-24 08:14:08 UTC

Verified in "4.10.0-0.nightly-2022-02-24-034852" release version. Testing upgrade from  4.9.23 to 4.10.0-0.nightly-2022-02-24-034852, it is observed that the patch works as intended and upgrade gets completed successfully:
--------
oc get clusterversion     
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        False         5m57s   Cluster version is 4.9.23

oc -n openshift-ingress edit service/router-default    
service/router-default edited

oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0   <-------
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
  creationTimestamp: "2022-02-24T06:26:49Z"

oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852 --allow-explicit-upgrade=true --force
Updating to release image registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852

Post upgrade:

oc get clusterversion                                 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-24-034852   True        False         9m52s   Cluster version is 4.10.0-0.nightly-2022-02-24-034852

oc get co ingress
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE 
ingress                                    4.10.0-0.nightly-2022-02-24-034852   True        False         False      66m     
  
oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"  <-----------
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
  creationTimestamp: "2022-02-24T06:26:49Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
--------

Comment 11 errata-xmlrpc 2022-03-10 16:44:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.