Created attachment 1710654 [details] authentication-operator-log-before-pod-restart.log Description of problem: Upgrading from OCP 4.5.2, 4.5.3 or 4.5.4 to a later version fail. The `oc get clusterversion` returns: 'Unable to apply 4.5.4: the cluster operator authentication is degraded' 'Cluster operator authentication is reporting a failure: RouterCertsDegraded: secret/v4-0-config-system-router-certs.spec.data[apps.ocf-rollup-55-rolling-upgrade.openshift-aws.rhocf-dev.net] -n openshift-authentication: certificate could not validate route hostname oauth-openshift.apps.ocf-rollup-55-rolling-upgrade.openshift-aws.rhocf-dev.net: x509: certificate signed by unknown authority' How reproducible: Steps to Reproduce: 1. Install 4.5.2 or later 2. Try upgrades via oc adm upgrade Actual results: The upgrades gets stuck at "Unable to apply 4.5.4: the cluster operator authentication is degraded" Expected results: Upgrade goes smoothly. Additional info: Attaching the logs from the authentication-operator in the openshift-authentication-operator namespace before I restarted the pod. Attaching the output of must-gather after restarting the authentication-operator pod (which didn't help).
The output of must-gather can be found here: https://drive.google.com/file/d/13ue0MiB1knEZZAP_TVRnqrGBoj_QWneq/view?usp=sharing (please let me know if there's a problem with permissions)
Created attachment 1710683 [details] Install config We setup a custom certificate for the Ingress by following docs at https://docs.openshift.com/container-platform/4.5/networking/configuring-a-custom-pki.html . I'm attaching the install-config.
The attached install-config.yaml specifies an "additionalTrustBundle" stanza; however, it does not specify a "proxy" stanza, and therefore the installer does not configure the proxy. Note that the "spec.trustedCA.name" field in the cluster proxy config is blank: % cat cluster-scoped-resources/config.openshift.io/proxies/cluster.yaml --- apiVersion: config.openshift.io/v1 kind: Proxy metadata: creationTimestamp: "2020-08-06T10:08:25Z" generation: 1 managedFields: - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:trustedCA: .: {} f:name: {} f:status: {} manager: cluster-bootstrap operation: Update time: "2020-08-06T10:08:26Z" name: cluster resourceVersion: "490" selfLink: /apis/config.openshift.io/v1/proxies/cluster uid: f7d34432-06fa-4d2c-924a-aa34b837aa81 spec: trustedCA: name: "" status: {} The installer did put the signing certificate in the "user-ca-bundle" configmap, and the default ingresscontroller is correctly configured to use a custom certificate that is signed using this signing certificate: % yaml2json() { python -c 'import json,sys,yaml;json.dump(yaml.safe_load(sys.stdin.read()),sys.stdout)' } % secret_name=$(cat namespaces/openshift-ingress-operator/operator.openshift.io/ingresscontrollers/default.yaml | yaml2json | jq -r .spec.defaultCertificate.name) % cat namespaces/openshift-ingress/core/secrets.yaml | yaml2json | jq -r ".items|.[]|select(.metadata.name==\"$secret_name\").data[\"tls.crt\"]" | base64 -d > tls.crt % cat namespaces/openshift-config/core/configmaps.yaml | yaml2json | jq -r '.items|.[]|select(.metadata.name=="user-ca-bundle").data["ca-bundle.crt"]' > user-ca-bundle.crt % openssl verify -verbose -CAfile user-ca-bundle.crt tls.crt tls.crt: OK However, the authentication operator does not user the "user-ca-bundle" configmap; rather, it uses the "trusted-ca-bundle" configmap, which is missing the signing certificate: % cat namespaces/openshift-config-managed/core/configmaps.yaml | yaml2json | jq -r '.items|.[]|select(.metadata.name=="trusted-ca-bundle").data["ca-bundle.crt"]' > trusted-ca-bundle.crt % openssl verify -verbose -CAfile trusted-ca-bundle.crt tls.crt tls.crt: C = US, ST = North Carolina, L = Raleigh, O = Red Hat Inc., OU = RHOSS-QE, CN = ocf-rollup-55-rolling-upgrade error 20 at 0 depth lookup:unable to get local issuer certificate zsh: exit 2 openssl verify -verbose -CAfile trusted-ca-bundle.crt tls.crt When you install a new cluster, be sure to specify a nonempty "proxy" stanza, which should cause the installer to configure cluster proxy config with "spec.trustedCA.name" set to "user-ca-bundle". With the proxy so configured, cluster-network-operator should merge the certificate in the "user-ca-bundle" configmap into the "trusted-ca-bundle" configmap, which should resolve the problem. To fix the cluster after installation, it should suffice to patch the cluster proxy config as follows: oc patch proxies.config.openshift.io/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' Does that resolve the issue?
Thanks, Miciah! Patching the cluster proxy helped and I was able to perform a cluster upgrade.
However, I don't want to be configuring the cluster proxy so I'm wondering what could be the "nonempty" proxy stanza in my case so that it doesn't break other functionality. Would it be just something like this? ``` proxy: noProxy: example.com ```
(In reply to Martin Gencur from comment #6) > However, I don't want to be configuring the cluster proxy so I'm wondering > what could be the "nonempty" proxy stanza in my case so that it doesn't > break other functionality. > Would it be just something like this? > ``` > proxy: > noProxy: example.com > ``` I was going to suggest using "proxy: {}", but the installer does not allow this; when I tried specifying "proxy: {}" in install-config.yaml, the installer reported, "invalid 'install-config.yaml' file: proxy: Required value: must include httpProxy or httpsProxy". I think the installer should be fixed, either (1) to allow specifying "proxy: {}" or (2) to set spec.trustedCA in proxy.config/cluster if install-config.yaml specifies additionalTrustBundle irrespective of whether install-config.yaml specifies proxy. I'll open a PR to do (2). With this change, the install-config.yaml that you provided would work as is. We'll try to get this fixed in the upcoming sprint.
On further investigation, the authentication operator should not be using the trusted CA bundle to validate the router's default certificate; instead, it should trust the default certificate implicitly (as described in https://github.com/openshift/enhancements/blob/master/enhancements/network/default-ingress-cert-configmap.md). However, the authentication operator does check that the certificate in the "default-ingress-cert" configmap matches the certificate in the "router-certs" secret. The must-gather archive includes the configmap but not the secret; would you be able to share the secret? `oc -n openshift-config-managed get secrets/router-certs`
I'm not sure this will help you because the original cluster was killed a long time ago. However, if I use the same install-config and start a new cluster the secret looks like this: oc -n openshift-config-managed get secrets/router-certs -oyaml apiVersion: v1 data: apps.ocf-rollup-139-rolling-upgrade.openshift-aws.rhocf-dev.net: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZQekNDQkNlZ0F3SUJBZ0lRRi9rM2lYRElYWmhQSzlUWllBeUdtVEFOQmdrcWhraUc5dzBCQVFzRkFEQ0IKb2pFTE1Ba0dBMVVFQmhNQ1ZWTXhGekFWQmdOVkJBZ1REazV2Y25Sb0lFTmhjbTlzYVc1aE1SQXdEZ1lEVlFRSApFd2RTWVd4bGFXZG9NUlV3RXdZRFZRUUtFd3hTWldRZ1NHRjBJRWx1WXk0eEVUQVBCZ05WQkFzVENGSklUMU5UCkxWRkZNVDR3UEFZRFZRUURFelZQY0dWdVUyaHBablFnVTJWeWRtVnliR1Z6Y3lBb1VraFBVMU1wSUZGRklFTmwKY25ScFptbGpZWFJsSUVGMWRHaHZjbWwwZVRBZUZ3MHlNREE0TWpReE1URTVNRFJhRncweU1EQTVNRGN4TVRFNQpNRFJhTUlHTE1Rc3dDUVlEVlFRR0V3SlZVekVYTUJVR0ExVUVDQk1PVG05eWRHZ2dRMkZ5YjJ4cGJtRXhFREFPCkJnTlZCQWNUQjFKaGJHVnBaMmd4RlRBVEJnTlZCQW9UREZKbFpDQklZWFFnU1c1akxqRVJNQThHQTFVRUN4TUkKVWtoUFUxTXRVVVV4SnpBbEJnTlZCQU1USG05alppMXliMnhzZFhBdE1UTTVMWEp2Ykd4cGJtY3RkWEJuY21GawpaVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPTXh2dzhTSXN5NCs2YXdneEJFCjBHNUdXYldlaWVOVnlPQVpWcVM0QmRhM0hyK3RQc29xNnZBWCtZYkRhaVl4TGcyVGlWejN3cGdVVnhud0hOR0IKSlZ1SjBPQ3ZWTmVQSlRzV3FQcUM2OGJ2UWI0bzBGTkI4aWlaay9tWEFtRytLNXZZSnpWNlBPdUJYeVA0Y1VLcQplTXRsVlY2Sm5Ydzh5YVdId3d6bTQ0VkdXbEdyOWc1UEJzRzZhbWlOOUpoRkZzZlkxYkFZYURQbjdmVjkrRFZkCnJCWDMwZ3FFSkVDckszRVVLZEExS0QrcFA3K0xVT2hkME9aZHk3Y2Z0bWZheFVaTzdBSm1LaUZuQlJaN1lyMVoKdzFDc2N2MmtiVGJHeGlRY2k1a3o1MkIzM0pZS2xMZnBUT0wwRy9yWmdGay9kZ1BPa2w0M1JyVHoxNEFFS1FwSgozT2NDQXdFQUFhT0NBWVF3Z2dHQU1BNEdBMVVkRHdFQi93UUVBd0lGb0RBZEJnTlZIU1VFRmpBVUJnZ3JCZ0VGCkJRY0RBUVlJS3dZQkJRVUhBd0l3SFFZRFZSME9CQllFRkxRZ1lSNExkSWQ2TFJwZXRqa0I3R0JVc1p3OU1COEcKQTFVZEl3UVlNQmFBRkppcTh1cmZlTFZ3cmZwRVN1NTRxWmRzbWE5T01JSUJEUVlEVlIwUkJJSUJCRENDQVFDQwpPbTlqWmkxeWIyeHNkWEF0TVRNNUxYSnZiR3hwYm1jdGRYQm5jbUZrWlM1dmNHVnVjMmhwWm5RdFlYZHpMbkpvCmIyTm1MV1JsZGk1dVpYU0NQbUZ3YVM1dlkyWXRjbTlzYkhWd0xURXpPUzF5YjJ4c2FXNW5MWFZ3WjNKaFpHVXUKYjNCbGJuTm9hV1owTFdGM2N5NXlhRzlqWmkxa1pYWXVibVYwZ2o5aGNIQnpMbTlqWmkxeWIyeHNkWEF0TVRNNQpMWEp2Ykd4cGJtY3RkWEJuY21Ga1pTNXZjR1Z1YzJocFpuUXRZWGR6TG5Kb2IyTm1MV1JsZGk1dVpYU0NRU291CllYQndjeTV2WTJZdGNtOXNiSFZ3TFRFek9TMXliMnhzYVc1bkxYVndaM0poWkdVdWIzQmxibk5vYVdaMExXRjMKY3k1eWFHOWpaaTFrWlhZdWJtVjBNQTBHQ1NxR1NJYjNEUUVCQ3dVQUE0SUJBUUNPY0xKTmNZY2c1Tkc3YXlJTQpoU21nTlU0Yk1jZFROU2lCUmV5eEMrV3lyeVJQdXFMdXY0NnIzeXpaNTVOR1o3TDZPT3NmdVhRdThqd2NzRktyCmNDNStqQU92Z0VEcThocVJSdkQ1OE1LMTZtaExsZnlIUEdxMklPaFpneTV6aXR1Q1FuaWdHTks4T0NYLy94U3QKY3RZUWlIa3RQcmJ6RU1mSitlWlRFWlkrVmlYSUc4eEhjUzFJZloxNHJiVzJCRUlyNTNvRUFacHdNWWhnTnFCZQp0ZGFBMlZDRXAydmd0OWZ0OG9xd2hNR1I5TmFoR0JleVVTcFdCdHhMOFpQQW5oS21WV1YvMGRkeDMralkvc2ZXCkVSTVI3WFQzMFZzNGZBVndjVndWRjd1UUE2VmpCSVZVQmptcDU1NjA1MzBPS3dFbkliVS8vOG9mUzIreGRCZEoKMzEyWQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCi0tLS0tQkVHSU4gUlNBIFBSSVZBVEUgS0VZLS0tLS0KTUlJRXBBSUJBQUtDQVFFQTR6Ry9EeElpekxqN3ByQ0RFRVRRYmtaWnRaNko0MVhJNEJsV3BMZ0YxcmNldjYwKwp5aXJxOEJmNWhzTnFKakV1RFpPSlhQZkNtQlJYR2ZBYzBZRWxXNG5RNEs5VTE0OGxPeGFvK29Mcnh1OUJ2aWpRClUwSHlLSm1UK1pjQ1liNHJtOWduTlhvODY0RmZJL2h4UXFwNHkyVlZYb21kZkR6SnBZZkRET2JqaFVaYVVhdjIKRGs4R3dicHFhSTMwbUVVV3g5alZzQmhvTStmdDlYMzROVjJzRmZmU0NvUWtRS3NyY1JRcDBEVW9QNmsvdjR0UQo2RjNRNWwzTHR4KzJaOXJGUms3c0FtWXFJV2NGRm50aXZWbkRVS3h5L2FSdE5zYkdKQnlMbVRQbllIZmNsZ3FVCnQrbE00dlFiK3RtQVdUOTJBODZTWGpkR3RQUFhnQVFwQ2tuYzV3SURBUUFCQW9JQkFHN0xXN2tseHdLL1V6bSsKNnF1TVkzampwZXdFSElwWTAxVTJCaUxkK3pyeW9uUW5NRysyN2t1WDVYL3EzR0V6cXBuRVVVQ2RNckNuZXJLVApmZnBOV01LRE92SFhqekJ3Qm1BQ2RQVjEwelY0aUQ4TCtFd2g1TTRYMXlub2txakg3TXhiWlFPWFVRNG9VUlZoCm14by91Qmk0bWlFNFN5ekRHRE01T2MyWTYydWFNQmYvcWFud2E1dWlNMEdlYVJYRzQ2OCsyYWJsa0xRbFV2eXIKRmQ1WnB4SzZxNUc5bHBmeEZyaVI0MlkvenRWR1N0MG00WHl2eE0xQjJUZzNGVmlmVjRTZzVLeGJiVlhHaHVnUgplb0dzR0VBTGt6TVBXT2VUb1JmWXVDVlpaZzlsbnhjMFo3LysrTUxOUTAzbTBKdWdCVXhOeVlRcTdDeTl2L0RvCnFMek9mSUVDZ1lFQSt4cFl0K2JINC93TlVQL09sQ2pPOTMycXVvTWl5TnFVOGcvTFNmcnk5blRMbkpxcXh0d0gKbmltSjJ5dUFtcHoxMUJoRUhPd1F3ZnI2bE96ODFlUC9nM09DdkM3a3hzK1B0dmsrcW01bXRyL1Y1WFJnNThoOQpGN1VWNVVjcVFpS1I2RVJvNTJWOW5LQXRoUUhpalc3c2lxbERBMzh6clR2L2xRVGh5R2doZ3M4Q2dZRUE1NkFJCnRFNG14Mnp4V252azUrNW5nQkRPNEQxdkw0UDBCaks4MlZlMGpPZFRrVXJkbDJmSUJpOXc5cmx4Q1VUNWhlWVEKQ245RlN1d2FibVVEOGlPeTRySmVxZHRTaUNmM1BseWIvSWRKZWsxRVhEU1g4bkNPOGxTc3JtajNTby9rTEJnUApReXVJeGxxeGhjWVJiUWhPOEdHaUlib2g2Qm40MWdFT2tFUG02bWtDZ1lCMjBrelJHUi9Wdmx2K3pFM1F4azdKCnhtbVh3SjRoTlczdDdaTmcrcU1tQkxhazhIdUhobThFWk51YkhzYklZeVhncTJydjFMVkpWWjVtQW83U0dBVzkKQ2xmKy9LRzlnbEtiWHU1TWI5bWkrTHdhekN0ZkF2eE96NTRBMU9BbVUzMS96Mzlrb0I0RWs3ZDJqU0hMazRYVApSNjB5Wm1ycHVzNkNrY0RWdUpEQytRS0JnUUM3VU9pNUtCcWtYSzR6QnM3djRoVkJ0RllaY3BWZ1Q4NGcxUmQwCmpVRXVVa1Y2MHBpeHdQUTZURk9HdENGOTVaSUZmekNwekpNMUxBdVVDNDFOWFNGbHcrcGFZMHd6WUY3S3lBbysKQndxZEphK0xBZDEvNnhjdlV0cnprVitycFFKWnhudFJUdnVscmVLeTFLTnpFYTBGS1cvODVwSlZLZXZhNWEvcAphNEJyUVFLQmdRRGliV2JNdkJ6MjlxOFQrRjVVUWhPRjEreHl4dkdkT1lhc2ExcGw0OFl0R0IxTXdaMXUvNFdBCmRyWW5jNGEzUnlkbHRtSGsvcnMrditzNlppOG5PbVRlMU1ET0JtalpkQWVIZXU4V3BIdVVGaGZjOVgwa05MdTQKQk5wVElLcGlvYmxXcjFXcjdxdUZqU1dYNExEZWtydysrS1lMYTRjS0FNL0ZMSWRIL0xOTzNnPT0KLS0tLS1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0K kind: Secret metadata: creationTimestamp: "2020-08-24T11:37:45Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:apps.ocf-rollup-139-rolling-upgrade.openshift-aws.rhocf-dev.net: {} f:type: {} manager: ingress-operator operation: Update time: "2020-08-24T11:50:37Z" name: router-certs namespace: openshift-config-managed resourceVersion: "24232" selfLink: /api/v1/namespaces/openshift-config-managed/secrets/router-certs uid: c00c811e-a5d9-4aa2-af81-53cfb535343a type: Opaque
The attached PR is on the authentication operator so moving to that component.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196