Hi, Working on to reproduce this in my env. I will get back to you once I find something. Thanks and regards, Miheer
Hi, I was able to reproduce this issue issue in my env I upgraded from 4.7.23 -> 4.8.4 on Vsphere Then I added the following in the spec section of the default ingress controller -> endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork [miheer@localhost cluster-ingress-operator]$ cat default-ingress-controller.yaml apiVersion: operator.openshift.io/v1 kind: IngressController metadata: creationTimestamp: "2021-08-30T07:52:20Z" finalizers: - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller generation: 1 name: default namespace: openshift-ingress-operator resourceVersion: "70349" uid: 9a306096-627f-4335-bddc-662540940999 spec: endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork replicas: 2 status: availableReplicas: 2 conditions: - lastTransitionTime: "2021-08-30T07:55:44Z" reason: Valid status: "True" type: Admitted - lastTransitionTime: "2021-08-30T09:28:53Z" status: "True" type: PodsScheduled - lastTransitionTime: "2021-08-30T09:23:30Z" message: The deployment has Available status condition set to True reason: DeploymentAvailable status: "True" type: DeploymentAvailable - lastTransitionTime: "2021-08-30T09:23:30Z" message: Minimum replicas requirement is met reason: DeploymentMinimumReplicasMet status: "True" type: DeploymentReplicasMinAvailable - lastTransitionTime: "2021-08-30T09:29:23Z" message: All replicas are available reason: DeploymentReplicasAvailable status: "True" type: DeploymentReplicasAllAvailable - lastTransitionTime: "2021-08-30T07:55:45Z" message: The configured endpoint publishing strategy does not include a managed load balancer reason: EndpointPublishingStrategyExcludesManagedLoadBalancer status: "False" type: LoadBalancerManaged - lastTransitionTime: "2021-08-30T07:55:45Z" message: No DNS zones are defined in the cluster dns config. reason: NoDNSZones status: "False" type: DNSManaged - lastTransitionTime: "2021-08-30T09:23:30Z" status: "True" type: Available - lastTransitionTime: "2021-08-30T08:00:14Z" status: "False" type: Degraded - lastTransitionTime: "2021-08-30T08:01:05Z" message: Canary route checks for the default ingress controller are successful reason: CanaryChecksSucceeding status: "True" type: CanaryChecksSucceeding domain: apps.mislaunkvsphereipi.qe.devcluster.openshift.com endpointPublishingStrategy: type: HostNetwork observedGeneration: 1 selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default tlsProfile: ciphers: - TLS_AES_128_GCM_SHA256 - TLS_AES_256_GCM_SHA384 - TLS_CHACHA20_POLY1305_SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES256-GCM-SHA384 - ECDHE-RSA-AES256-GCM-SHA384 - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - DHE-RSA-AES128-GCM-SHA256 - DHE-RSA-AES256-GCM-SHA384 minTLSVersion: VersionTLS12 [miheer@localhost cluster-ingress-operator]$ And it did not get reflected on the router. Workaround -> I had to delete the default ingresscontroller and then add the changes as [0] to have them reflected in the router [0] endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork When I directly installed 4.8.4 on VSphere and I made the changes [0] they were correctly reflected on the router. So this issue is happening only during upgrade. But I am not sure why. I think this might be related the API might have been storing the object in etcd using the schema from 4.7. But not sure. Things I checked were the yaml contents of the default ingresscontroller. A) Before deletion of ingress controller the yaml looked as follows when adding of the following section [0] in the spec did not work- [0] endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork ~~~~~~~~~~~~~ Note this is the yaml just created after deletion so you wont see the [0] under the spec. I had added it later and it was not correctly reflected in the router. [miheer@localhost cluster-ingress-operator]$ cat default-ingress-controller.yaml apiVersion: operator.openshift.io/v1 kind: IngressController metadata: creationTimestamp: "2021-08-30T07:52:20Z" finalizers: - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller generation: 1 name: default namespace: openshift-ingress-operator resourceVersion: "70349" uid: 9a306096-627f-4335-bddc-662540940999 spec: replicas: 2 status: availableReplicas: 2 conditions: - lastTransitionTime: "2021-08-30T07:55:44Z" reason: Valid status: "True" type: Admitted - lastTransitionTime: "2021-08-30T09:28:53Z" status: "True" type: PodsScheduled - lastTransitionTime: "2021-08-30T09:23:30Z" message: The deployment has Available status condition set to True reason: DeploymentAvailable status: "True" type: DeploymentAvailable - lastTransitionTime: "2021-08-30T09:23:30Z" message: Minimum replicas requirement is met reason: DeploymentMinimumReplicasMet status: "True" type: DeploymentReplicasMinAvailable - lastTransitionTime: "2021-08-30T09:29:23Z" message: All replicas are available reason: DeploymentReplicasAvailable status: "True" type: DeploymentReplicasAllAvailable - lastTransitionTime: "2021-08-30T07:55:45Z" message: The configured endpoint publishing strategy does not include a managed load balancer reason: EndpointPublishingStrategyExcludesManagedLoadBalancer status: "False" type: LoadBalancerManaged - lastTransitionTime: "2021-08-30T07:55:45Z" message: No DNS zones are defined in the cluster dns config. reason: NoDNSZones status: "False" type: DNSManaged - lastTransitionTime: "2021-08-30T09:23:30Z" status: "True" type: Available - lastTransitionTime: "2021-08-30T08:00:14Z" status: "False" type: Degraded - lastTransitionTime: "2021-08-30T08:01:05Z" message: Canary route checks for the default ingress controller are successful reason: CanaryChecksSucceeding status: "True" type: CanaryChecksSucceeding domain: apps.mislaunkvsphereipi.qe.devcluster.openshift.com endpointPublishingStrategy: type: HostNetwork observedGeneration: 1 selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default tlsProfile: ciphers: - TLS_AES_128_GCM_SHA256 - TLS_AES_256_GCM_SHA384 - TLS_CHACHA20_POLY1305_SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES256-GCM-SHA384 - ECDHE-RSA-AES256-GCM-SHA384 - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - DHE-RSA-AES128-GCM-SHA256 - DHE-RSA-AES256-GCM-SHA384 minTLSVersion: VersionTLS12 [miheer@localhost cluster-ingress-operator]$ ~~~~~~~~~~~~~~ B) After deletion the yaml when adding of the following section [0] in the spec did work looked as follows - [0] endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~` Note this is the yaml just created after deletion so you wont see the [0] under the spec. I had added it later and it was correctly reflected in the router. [miheer@localhost cluster-ingress-operator]$ cat ing-cont-after-deletion apiVersion: operator.openshift.io/v1 kind: IngressController metadata: creationTimestamp: "2021-08-30T15:44:10Z" finalizers: - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller generation: 1 name: default namespace: openshift-ingress-operator resourceVersion: "209953" uid: d18380e7-0c39-439b-96c7-2c56a5f7fd7e spec: httpErrorCodePages: name: "" replicas: 2 tuningOptions: {} ------------------------------------------------------------------------------This was added after deletion unsupportedConfigOverrides: null -------------------------------------------------------------- This was added after deletion status: availableReplicas: 1 conditions: - lastTransitionTime: "2021-08-30T15:44:10Z" reason: Valid status: "True" type: Admitted - lastTransitionTime: "2021-08-30T15:45:04Z" message: 'Some pods are not scheduled: Pod "router-default-64f9c4985b-wjj99" cannot ---------------------Please ignore this as this was related to my env which was fixed later. be scheduled: 0/5 nodes are available: 2 node(s) didn''t have free ports for the requested pod ports, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate. Make sure you have sufficient worker nodes.' reason: PodsNotScheduled status: "False" type: PodsScheduled - lastTransitionTime: "2021-08-30T15:45:49Z" message: The deployment has Available status condition set to True reason: DeploymentAvailable status: "True" type: DeploymentAvailable - lastTransitionTime: "2021-08-30T15:45:49Z" message: Minimum replicas requirement is met reason: DeploymentMinimumReplicasMet status: "True" type: DeploymentReplicasMinAvailable - lastTransitionTime: "2021-08-30T15:45:49Z" message: 1/2 of replicas are available reason: DeploymentReplicasNotAvailable status: "False" type: DeploymentReplicasAllAvailable - lastTransitionTime: "2021-08-30T15:44:11Z" message: The configured endpoint publishing strategy does not include a managed load balancer reason: EndpointPublishingStrategyExcludesManagedLoadBalancer status: "False" type: LoadBalancerManaged - lastTransitionTime: "2021-08-30T15:44:11Z" message: No DNS zones are defined in the cluster dns config. reason: NoDNSZones status: "False" type: DNSManaged - lastTransitionTime: "2021-08-30T15:45:49Z" status: "True" type: Available - lastTransitionTime: "2021-08-30T15:45:49Z" status: "False" type: Degraded domain: apps.mislaunkvsphereipi.qe.devcluster.openshift.com endpointPublishingStrategy: hostNetwork: protocol: TCP type: HostNetwork observedGeneration: 1 selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default tlsProfile: ciphers: - TLS_AES_128_GCM_SHA256 - TLS_AES_256_GCM_SHA384 - TLS_CHACHA20_POLY1305_SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES256-GCM-SHA384 - ECDHE-RSA-AES256-GCM-SHA384 - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - DHE-RSA-AES128-GCM-SHA256 - DHE-RSA-AES256-GCM-SHA384 minTLSVersion: VersionTLS12 [miheer@localhost cluster-ingress-operator]$ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Only difference I saw is that tuningOptions: {} ------------------------------------------------------------------------------This was added after deletion unsupportedConfigOverrides: null -------------------------------------------------------------- This was added after deletion fields were added after deleting the default ingress controller. The http://infrastructures.config.openshift.io/cluster also looked fine to me when I compared it with the 4.8.4 installed cluster and 4.8.4 upgraded cluster I think this might be related the API might have been storing the object in etcd using the schema from 4.7. But again why it was not working after upgrade I am not yet sure. Is the customer not OK with the workaround ?
I am assigning this to the team which handles upgrade as I am not sure why this issue is happening only after upgrade. As mentioned earlier it might be the case where the API might have been storing the object in etcd using the schema from 4.7.
Please test the credentials from app used by ingress operator: - Get credentials of App used by ingress: ./oc-4.7.23 get secret cloud-credentials -n openshift-ingress-operator -o jsonpath='{.data}' > ig-creds.json export azure_client_id=$(jq -r '.azure_client_id' ig-creds.json |base64 -d ) export azure_client_secret=$(jq -r '.azure_client_secret' ig-creds.json |base64 -d ) export azure_region=$(jq -r '.azure_region' ig-creds.json |base64 -d ) export azure_resource_prefix=$(jq -r '.azure_resource_prefix' ig-creds.json |base64 -d ) export azure_resourcegroup=$(jq -r '.azure_resourcegroup' ig-creds.json |base64 -d ) export azure_subscription_id=$(jq -r '.azure_subscription_id' ig-creds.json |base64 -d ) export azure_tenant_id=$(jq -r '.azure_tenant_id' ig-creds.json |base64 -d ) - Login using this credentials az login --service-principal -u $azure_client_id \ --password $azure_client_secret --tenant $azure_tenant_id - Check if roleDefinitionName=Contributor for AppId az role assignment list --assignee $azure_client_id -g $azure_resourcegroup az role assignment list --assignee $azure_client_id -g $azure_resourcegroup |jq .[].roleDefinitionName "Contributor"
Oh sorry please ignore comment 4 as it was not meant for this bug
CVO is applying manifests specified by ingress. If these manifests are correct and CVO needs to be resolved we'll need a clarification which fields need to be changed. Moving back to Routing.
Any update on this ?
*** Bug 2025949 has been marked as a duplicate of this bug. ***
Hello team! From a lab environment where I replicated the issue, the only difference I can see is the data written in etcd is different for key /kubernetes.io/operator.openshift.io/ingresscontrollers/openshift-ingress-operator/default. Upgraded cluster: --- .. "domain": "apps.o.rlab.sh", "endpointPublishingStrategy": { "type": "HostNetwork" }, .. --- Fresh 4.8.18 cluster: --- ... "domain": "apps.ocp4upi2.rhlabs.local", "endpointPublishingStrategy": { "hostNetwork": { "protocol": "PROXY" }, "type": "HostNetwork" }, ... ---
https://github.com/openshift/cluster-ingress-operator/pull/681 will fix this.
Verified in "4.10.0-0.nightly-2021-12-21-130047" release version. The changes made in the "hostnetwork" protocol gets reflected correctly on the proxy pods: -------- oc -n openshift-ingress-operator edit ingresscontroller default ingresscontroller.operator.openshift.io/default edited domain: apps.aiyengar410vsp.qe.devcluster.openshift.com endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork observedGeneration: 4 selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default oc -n openshift-ingress get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-5865ccbfb6-25pl9 1/1 Running 0 6m42s 172.31.249.221 aiyengar410vsp-ppvps-worker-tm99z <none> <none> router-default-5865ccbfb6-xfb5n 1/1 Running 0 8m 172.31.249.77 aiyengar410vsp-ppvps-worker-87jfw <none> <none> oc -n openshift-ingress rsh router-default-5865ccbfb6-xfb5n sh-4.4$ env | grep -i ROUTER_USE_PROXY_PROTOCOL ROUTER_USE_PROXY_PROTOCOL=true sh-4.4$ grep -ir "accept-proxy" haproxy.config bind :80 accept-proxy bind :443 accept-proxy --------
Hi, if there is anything that customers should know about this bug or if there are any important workarounds that should be outlined in the bug fixes section OpenShift Container Platform 4.10 release notes, please update the Doc Type and Doc Text fields. If not, can you please mark it as "no doc update"? Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056