Created attachment 1844682 [details] must-gather Description of problem: Enabling ExternalCloudProvider on healthy 4.10 cluster on top of OSP16.1 is causing some clusterOperators to be degraded. Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-02-033910 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210903.n.0) with manila and SSL encryption enabled. How reproducible: 1. Install OCP4.10 with IPI succesfully: $ tail ostest/.openshift_install.log time="2021-12-02T18:53:18Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/stack/ostest/auth/kubeconfig'" time="2021-12-02T18:53:18Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.ostest.shiftstack.com" time="2021-12-02T18:53:18Z" level=info msg="Login to the console with user: \"kubeadmin\", and password: \"SpEFF-zx7QT-iTYvv-I7mhn\"" time="2021-12-02T18:53:18Z" level=debug msg="Time elapsed per stage:" time="2021-12-02T18:53:18Z" level=debug msg=" : 2m0s" time="2021-12-02T18:53:18Z" level=debug msg="Bootstrap Complete: 29m34s" time="2021-12-02T18:53:18Z" level=debug msg=" API: 11m37s" time="2021-12-02T18:53:18Z" level=debug msg=" Bootstrap Destroy: 42s" time="2021-12-02T18:53:18Z" level=debug msg=" Cluster Operators: 16m39s" time="2021-12-02T18:53:18Z" level=info msg="Time elapsed: 56m6s" $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-0.nightly-2021-12-02-033910 True False False 14h baremetal 4.10.0-0.nightly-2021-12-02-033910 True False False 14h cloud-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 14h cloud-credential 4.10.0-0.nightly-2021-12-02-033910 True False False 14h cluster-api 4.10.0-0.nightly-2021-12-02-033910 True False False 14h cluster-autoscaler 4.10.0-0.nightly-2021-12-02-033910 True False False 14h config-operator 4.10.0-0.nightly-2021-12-02-033910 True False False 14h console 4.10.0-0.nightly-2021-12-02-033910 True False False 14h csi-snapshot-controller 4.10.0-0.nightly-2021-12-02-033910 True False False 14h dns 4.10.0-0.nightly-2021-12-02-033910 True False False 14h etcd 4.10.0-0.nightly-2021-12-02-033910 True False False 14h image-registry 4.10.0-0.nightly-2021-12-02-033910 True False False 14h ingress 4.10.0-0.nightly-2021-12-02-033910 True False False 14h insights 4.10.0-0.nightly-2021-12-02-033910 True False False 14h kube-apiserver 4.10.0-0.nightly-2021-12-02-033910 True False False 14h kube-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 14h kube-scheduler 4.10.0-0.nightly-2021-12-02-033910 True False False 14h kube-storage-version-migrator 4.10.0-0.nightly-2021-12-02-033910 True False False 14h machine-api 4.10.0-0.nightly-2021-12-02-033910 True False False 14h machine-approver 4.10.0-0.nightly-2021-12-02-033910 True False False 14h machine-config 4.10.0-0.nightly-2021-12-02-033910 True False False 14h marketplace 4.10.0-0.nightly-2021-12-02-033910 True False False 14h monitoring 4.10.0-0.nightly-2021-12-02-033910 True False False 14h network 4.10.0-0.nightly-2021-12-02-033910 True False False 14h node-tuning 4.10.0-0.nightly-2021-12-02-033910 True False False 14h openshift-apiserver 4.10.0-0.nightly-2021-12-02-033910 True False False 14h openshift-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 14h openshift-samples 4.10.0-0.nightly-2021-12-02-033910 True False False 14h operator-lifecycle-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 14h operator-lifecycle-manager-catalog 4.10.0-0.nightly-2021-12-02-033910 True False False 14h operator-lifecycle-manager-packageserver 4.10.0-0.nightly-2021-12-02-033910 True False False 14h service-ca 4.10.0-0.nightly-2021-12-02-033910 True False False 14h storage 4.10.0-0.nightly-2021-12-02-033910 True False False 14h 2. Enabling external CCM by adding the featureGate: $ oc get featureGate -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2021-12-02T18:18:57Z" generation: 2 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 78b3d110-635c-4fdf-b628-b20a1069e8f7 resourceVersion: "298866" uid: 7b9e8597-0186-4c3d-88fa-60856b46629d spec: customNoUpgrade: enabled: - ExternalCloudProvider featureSet: CustomNoUpgrade kind: List metadata: resourceVersion: "" selfLink: "" 3. The external CCM pods are deployed: $ oc get pods -n openshift-cloud-controller-manager NAME READY STATUS RESTARTS AGE openstack-cloud-controller-manager-c698f7f49-fjw6w 1/1 Running 0 166m openstack-cloud-controller-manager-c698f7f49-kpqrc 1/1 Running 0 162m but the cluster became unhealthy and inoperative: $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-0.nightly-2021-12-02-033910 False False True 166m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ostest.shiftstack.com/healthz": EOF baremetal 4.10.0-0.nightly-2021-12-02-033910 True False False 18h cloud-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 18h cloud-credential 4.10.0-0.nightly-2021-12-02-033910 True False False 18h cluster-api 4.10.0-0.nightly-2021-12-02-033910 True False False 18h cluster-autoscaler 4.10.0-0.nightly-2021-12-02-033910 True False False 18h config-operator 4.10.0-0.nightly-2021-12-02-033910 True False False 18h console 4.10.0-0.nightly-2021-12-02-033910 False False False 166m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ostest.shiftstack.com): Get "https://console-openshift-console.apps.ostest.shiftstack.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) csi-snapshot-controller 4.10.0-0.nightly-2021-12-02-033910 True False False 18h dns 4.10.0-0.nightly-2021-12-02-033910 True False False 18h etcd 4.10.0-0.nightly-2021-12-02-033910 True False False 17h image-registry 4.10.0-0.nightly-2021-12-02-033910 True False False 17h ingress 4.10.0-0.nightly-2021-12-02-033910 True False False 17h insights 4.10.0-0.nightly-2021-12-02-033910 True False False 17h kube-apiserver 4.10.0-0.nightly-2021-12-02-033910 True False False 17h kube-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 17h kube-scheduler 4.10.0-0.nightly-2021-12-02-033910 True False False 17h kube-storage-version-migrator 4.10.0-0.nightly-2021-12-02-033910 True False False 169m machine-api 4.10.0-0.nightly-2021-12-02-033910 True False False 17h machine-approver 4.10.0-0.nightly-2021-12-02-033910 True False False 18h machine-config 4.10.0-0.nightly-2021-12-02-033910 True False False 17h marketplace 4.10.0-0.nightly-2021-12-02-033910 True False False 18h monitoring 4.10.0-0.nightly-2021-12-02-033910 True False False 17h network 4.10.0-0.nightly-2021-12-02-033910 True False False 18h node-tuning 4.10.0-0.nightly-2021-12-02-033910 True False False 163m openshift-apiserver 4.10.0-0.nightly-2021-12-02-033910 True False False 17h openshift-controller-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 17h openshift-samples 4.10.0-0.nightly-2021-12-02-033910 True False False 17h operator-lifecycle-manager 4.10.0-0.nightly-2021-12-02-033910 True False False 18h operator-lifecycle-manager-catalog 4.10.0-0.nightly-2021-12-02-033910 True False False 18h operator-lifecycle-manager-packageserver 4.10.0-0.nightly-2021-12-02-033910 True False False 17h service-ca 4.10.0-0.nightly-2021-12-02-033910 True False False 18h storage 4.10.0-0.nightly-2021-12-02-033910 True False False 17h $ oc get co/authentication -o json | jq -r '.status.conditions[] | select (.type=="Degraded")' { "lastTransitionTime": "2021-12-03T09:42:18Z", "message": "OAuthServerRouteEndpointAccessibleControllerDegraded: Get \"https://oauth-openshift.apps.ostest.shiftstack.com/healthz\": EOF", "reason": "OAuthServerRouteEndpointAccessibleController_SyncError", "status": "True", "type": "Degraded" } $ oc logs -n openshift-authentication-operator -l app=authentication-operator| tail E1203 11:39:22.050357 1 base_controller.go:272] OAuthServerRouteEndpointAccessibleController reconciliation failed: Get "https://oauth-openshift.apps.ostest.shiftstack.com/healthz": EOF E1203 11:39:22.327328 1 base_controller.go:272] OAuthServerRouteEndpointAccessibleController reconciliation failed: Get "https://oauth-openshift.apps.ostest.shiftstack.com/healthz": EOF $ oc rsh -n openshift-authentication-operator $(oc get pod -n openshift-authentication-operator -l app=authentication-operator -o NAME) curl -k https://oauth-openshift.apps.ostest.shiftstack.com/healthz curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.ostest.shiftstack.com:443 Same command is working on a cluster with in-tree cloud manager: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-12-02-155018 True False 14h Cluster version is 4.9.0-0.nightly-2021-12-02-155018 $ oc rsh -n openshift-authentication-operator $(oc get pod -n openshift-authentication-operator -l app=authentication-operator -o NAME) curl -k https://oauth-openshift.apps.ostest.shiftstack.com/healthz ok Actual results: cluster inoperative after enabling the TP feature. Expected results: TP feature is enabled successfully. Additional info: - Must-gather on attached. - install-config.yaml attached.
Verified on 4.10.0-0.nightly-2021-12-06-201335 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210903.n.0) $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-12-06-201335 True False 10m Cluster version is 4.10.0-0.nightly-2021-12-06-201335 $ oc get featureGate cluster -o yaml apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2021-12-09T10:40:57Z" generation: 1 name: cluster resourceVersion: "1420" uid: dcfaf925-591b-4628-a9a6-0a104d2afa74 spec: customNoUpgrade: enabled: - ExternalCloudProvider featureSet: CustomNoUpgrade
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056