Bug 2084336

Summary: Ingresscontroller reconcilations failing but not shown in operator logs or status of ingresscontroller.
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aiyengar, aos-bugs, apizarro, bmcelvee, hdo, hongli, imm, jko, lmohanty, mapandey, misalunk, mmasters, tkondvil, vrutkovs
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Before OpenShift 4.8, the IngressController API did not have any subfields under the "status.endpointPublishingStrategy.hostNetwork" and "status.endpointPublishingStrategy.nodePort" fields. As result, these fields could be null even if the "spec.endpointPublishingStrategy.type" field was set to "HostNetwork" or "NodePortService". OpenShift 4.8 added the "status.endpointPublishingStrategy.hostNetwork.protocol" and "status.endpointPublishingStrategy.nodePort.protocol" subfields, and the ingress operator now sets default values for these subfields when the operator admits or re-admits an IngressController that specifies the "HostNetwork" or "NodePortService" strategy type, respectively. However, a cluster that was upgraded from an earlier version of OpenShift could have an already admitted IngressController with null values for these status fields even when the IngressController specified the "HostNetwork" or "NodePortService" endpoint publishing strategy type. In this case, the operator ignored updates to these spec fields. Consequence: Updating "spec.endpointPublishingStrategy.hostNetwork.protocol" or "spec.endpointPublishingStrategy.nodePort.protocol" to "PROXY" to enable PROXY protocol on an existing IngressController had no effect, and it was necessary to delete and recreate the IngressController to enable PROXY protocol. Fix: The ingress operator was changed so that it correctly updates the status fields when "status.endpointPublishingStrategy.hostNetwork" or "status.endpointPublishingStrategy.nodePort" is null and the IngressController's spec fields specify PROXY protocol with the "HostNetwork" or "NodePortService" endpoint publishing strategy type, respectively. Result: Setting "spec.endpointPublishingStrategy.hostNetwork.protocol" or "spec.endpointPublishingStrategy.nodePort.protocol" to "PROXY" now takes proper effect on upgraded clusters.
Story Points: ---
Clone Of: 1997226
: 2084337 (view as bug list) Environment:
Last Closed: 2022-07-20 10:52:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1997226    
Bug Blocks: 2084337    

Comment 1 Arvind iyengar 2022-06-29 08:47:33 UTC
Verified with the latest "4.9.0-0.ci.test-2022-06-29-053024-ci-ln-9pchkyt-latest" image. With this image containing the fix, it is observed that the "PROXY" protocol option gets sets correctly:
------
Clusterversion:
oc get clusterversion           
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2022-06-29-053024-ci-ln-9pchkyt-latest   True        False         31m     Cluster version is 4.9.0-0.ci.test-2022-06-29-053024-ci-ln-9pchkyt-latest

Ingresscontroller state before:
  domain: apps.9pchkyt-b5564.shiftstack.devcluster.openshift.com
  endpointPublishingStrategy:
    hostNetwork:
      protocol: TCP
    type: HostNetwork
  observedGeneration: 1
  selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
  tlsProfile:


Post applying proxy protocol option:
  domain: apps.9pchkyt-b5564.shiftstack.devcluster.openshift.com
  endpointPublishingStrategy:
    hostNetwork:
      protocol: PROXY
    type: HostNetwork

oc -n openshift-ingress get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP           NODE                                 NOMINATED NODE   READINESS GATES
router-default-6bf748475b-98xz6   1/1     Running   0          3m12s   10.0.2.235   9pchkyt-b5564-dvf2p-worker-0-ps7q6   <none>           <none>
router-default-6bf748475b-jg2hb   1/1     Running   0          3m48s   10.0.0.107   9pchkyt-b5564-dvf2p-worker-0-xph6g   <none>           <none>

oc -n openshift-ingress exec router-default-6bf748475b-98xz6 -- env | grep -i ROUTER_USE_PROXY_PROTOCOL
ROUTER_USE_PROXY_PROTOCOL=true
------

Comment 4 Arvind iyengar 2022-07-14 06:02:23 UTC
This bug has been verified via pre-merge workflow (reference: C#1). The result is similar when tested with the latest 4.9 promoted image. Hence marking this as "verified"

Comment 6 errata-xmlrpc 2022-07-20 10:52:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.43 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5561