Bug 2097736

Summary: IngressControllersNotUpgradeable: load balancer service has been modified; changes must be reverted before upgrading
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: aos-bugs, mmasters, rpittau, wking
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When an ingresscontroller is configured to use a LoadBalancer-type service, the ingress operator creates and manages this service, and if the operator detects that the user has modified an annotation that the operator manages on this service, then the operator sets the ingress clusteroperator's "Upgradeable" status condition "False" to block upgrades. However, the operator's check of the service's annotations had a logic error that could falsely report that the user had modified the annotations when service had no annotations. Consequence: The ingress operator could erroneously set the ingress clusteroperator's "Upgradeable" status condition to "False", blocking upgrades, if the service had no annotations. In particular, this could happen on Alibaba, Azure, and GCP clusters when using OVN with a public (not internal) load balancer. Fix: The logic that checks the service's annotations was fixed to handle empty annotations correctly. Result: The ingress operator should no longer erroneously block upgrades.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-20 10:52:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2097735    
Bug Blocks: 2097737    

Description OpenShift BugZilla Robot 2022-06-16 12:40:27 UTC
+++ This bug was initially created as a clone of Bug #2097735 +++

+++ This bug was initially created as a clone of Bug #2097555 +++

Description of problem:

The cluster-ingress-operator is causing 4.9 -> 4.10 CI upgrade tests to fail:

{  fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jun 15 18:22:34.283: Some cluster operators are not ready: ingress (Upgradeable=False IngressControllersNotUpgradeable: Some ingresscontrollers are not upgradeable: ingresscontroller "default" is not upgradeable: OperandsNotUpgradeable: One or more managed resources are not upgradeable: load balancer service has been modified; changes must be reverted before upgrading: )}


OpenShift release version: 4.9


Cluster Platform: GCP, OVN


How reproducible: Unclear, but most recent nightly GCP OVN upgrade tests are failing due to this bug, even with three retries per nightly run.


Steps to Reproduce (in detail):
1. Run OCP 4.9 to 4.10 nightly upgrade test


Actual results: OCP cluster upgrade fails


Expected results: OCP cluster upgrade fails


Impact of the problem: This is currently blocking releases in the 4.10.z stream.


Additional info:

Examples of failing runs:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade/1537128423896387584

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade/1536843529286848512

Comment 2 Hongan Li 2022-07-07 02:55:36 UTC
verified with 4.9.0-0.nightly-2022-07-06-220327 and passed.

$ oc get co/ingress 
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.9.0-0.nightly-2022-07-06-220327   True        False         False      28m     

$ oc get co/ingress -oyaml
<---snip--->
  - lastTransitionTime: "2022-07-07T02:09:25Z"
    reason: IngressControllersUpgradeable
    status: "True"
    type: Upgradeable


And in previous build, we can see Upgradeable=False

$ oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.9.0-0.nightly-2022-06-24-070308   True        False         False      45m     

$ oc get co/ingress -oyaml
<---snip--->
  - lastTransitionTime: "2022-07-07T01:52:46Z"
    message: 'Some ingresscontrollers are not upgradeable: ingresscontroller "default"
      is not upgradeable: OperandsNotUpgradeable: One or more managed resources are
      not upgradeable: load balancer service has been modified; changes must be reverted
      before upgrading: '
    reason: IngressControllersNotUpgradeable
    status: "False"
    type: Upgradeable

Comment 5 errata-xmlrpc 2022-07-20 10:52:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.43 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5561