Bug 2081447

Summary: Ingress operator performs spurious updates in response to API's defaulting of router deployment's router container's ports' protocol field
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Melvin Joseph <mjoseph>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, hongli
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:09:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2022-05-03 18:22:36 UTC
Description of problem:

When the ingress operator creates or updates a router deployment, the API sets default values for the protocol field of the router container's ports, which the operator detects as an external update and attempts to revert.  The operator should not update the deployment in response to API defaulting.


OpenShift release version:

The issue was introduced in OpenShift 4.11 by <https://github.com/openshift/cluster-ingress-operator/pull/694/commits/af653f9fa7368cf124e11b7ea4666bc40e601165>.


Cluster Platform:

All platforms are affected.


How reproducible:

100%.


Steps to Reproduce:

1. Launch a new cluster.

2. Check the ingress operator's logs:

    oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator


Actual results:

The operator's logs have "updated router deployment" repeated over and over.  For example, in this CI run, I see "updated router deployment" 177 times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/724/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1519044935800590336/artifacts/e2e-aws-operator/gather-extra/artifacts/pods/openshift-ingress-operator_ingress-operator-86dccb55cd-p4529_ingress-operator.log


Expected results:

The operator should ignore updates by the API that only set default values, and the operator should not log "updated router deployment" unless the deployment is updated outside of API defaulting.  For example, in this CI run from the release-4.10 branch, I see "updated router deployment" 2 times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/748/pull-ci-openshift-cluster-ingress-operator-release-4.10-e2e-aws-operator/1519357272591962112/artifacts/e2e-aws-operator/gather-extra/artifacts/pods/openshift-ingress-operator_ingress-operator-59b64ff4bb-7cdnw_ingress-operator.log


Impact of the problem:

The spurious reconciliation requests incur excessive CPU and API usage and add noise to logs.

Comment 3 Melvin Joseph 2022-05-05 05:13:53 UTC
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion


NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-04-214114   True        False         163m    Cluster version is 4.11.0-0.nightly-2022-05-04-214114

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress-operator logs deploy/ingress-operator -c ingress-operator
There was no "updated router deployment" message.

Also checked the CI run.
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/754/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1521829918185361408/artifacts/e2e-aws-operator/gather-extra/artifacts/pods/openshift-ingress-operator_ingress-operator-66bdf59d76-6nrc9_ingress-operator.log

There was no much "updated router deployment" logs and we can see there is only 9 times.

Hence verifying the bug.

Comment 5 errata-xmlrpc 2022-08-10 11:09:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069