Bug 2056928 - Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation
Summary: Ingresscontroller LB scope change behaviour differs for different values of a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.10.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On: 2055470
Blocks: 2057518
TreeView+ depends on / blocked
 
Reported: 2022-02-22 11:39 UTC by Ravi Trivedi
Modified: 2022-08-04 22:35 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The AWS cloud-provider implementation checks the "service.beta.kubernetes.io/aws-load-balancer-internal" service annotation to determine whether a service load-balancer (SLB) should be configured to be internal (as opposed to being public). The cloud-provider implementation recognizes both the value "0.0.0.0/0" and the value "true" as indicating that an SLB should be internal. The ingress operator in OpenShift 4.7 and earlier sets the value "0.0.0.0/0", and the ingress operator in OpenShift 4.8 and later sets the value "true" for services that the operator creates for internal SLBs. A service that was created on an older cluster might have the annotation value "0.0.0.0/0", which could cause comparisons that check for the "true" value to return the wrong result. Consequence: When a cluster had an internal SLB that had been configured using the old annotation value and the cluster was upgraded to OpenShift 4.10, the ingress operator would report the Progressing=True clusteroperator status condition, preventing the upgrade from completing. Fix: Logic was added to the ingress operator to normalize the service.beta.kubernetes.io/aws-load-balancer-internal service annotation for operator-managed services by replacing the value "0.0.0.0/0" with the value "true". Result: The ingress operator no longer prevents upgrades of clusters with the "service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0" annotation from completing.
Clone Of: 2055470
Environment:
Last Closed: 2022-03-10 16:44:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 705 0 None Merged [release-4.10] Bug 2056928: Normalize the AWS internal LB annotation value 2022-02-23 15:25:25 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:44:39 UTC

Comment 2 Hongan Li 2022-02-23 05:07:13 UTC
tested with the cluster that launched by cluster-bot (launch openshift/cluster-ingress-operator#705 aws) and the PR works as expected.

after changing scope to internal and manually changing annotation value to "0.0.0.0/0", ingress-operator update the annotation to "true" immediately.

$ oc -n openshift-ingress annotate svc/router-default service.beta.kubernetes.io/aws-load-balancer-internal="0.0.0.0/0" --overwrite
service/router-default annotated

$ oc -n openshift-ingress get svc/router-default -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""


### logs of ingress-operator

2022-02-23T04:57:26.536Z	INFO	operator.ingress_controller	ingress/load_balancer_service.go:294	normalized annotation	{"namespace": "openshift-ingress", "name": "router-default", "annotation": "service.beta.kubernetes.io/aws-load-balancer-internal", "old": "0.0.0.0/0", "new": "true"}


$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest   True        False         20m     Cluster version is 4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest

Comment 4 Mike Fiedler 2022-02-23 19:17:55 UTC
Moving to MODIFIED.  No nightly 4.10 includes the fix for this.

Comment 8 Arvind iyengar 2022-02-24 08:14:08 UTC
Verified in "4.10.0-0.nightly-2022-02-24-034852" release version. Testing upgrade from  4.9.23 to 4.10.0-0.nightly-2022-02-24-034852, it is observed that the patch works as intended and upgrade gets completed successfully:
--------
oc get clusterversion     
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        False         5m57s   Cluster version is 4.9.23

oc -n openshift-ingress edit service/router-default    
service/router-default edited

oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0   <-------
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
  creationTimestamp: "2022-02-24T06:26:49Z"

oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852 --allow-explicit-upgrade=true --force
Updating to release image registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852

Post upgrade:

oc get clusterversion                                 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-24-034852   True        False         9m52s   Cluster version is 4.10.0-0.nightly-2022-02-24-034852

oc get co ingress
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE 
ingress                                    4.10.0-0.nightly-2022-02-24-034852   True        False         False      66m     
  
oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"  <-----------
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
  creationTimestamp: "2022-02-24T06:26:49Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
--------

Comment 11 errata-xmlrpc 2022-03-10 16:44:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.