Hide Forgot
Description of problem: ingress operator should report Upgradeable False to remind user before upgrade to 4.10 when Non-SAN certs are used, see bug 2037274 for background. OpenShift release version: 4.9.0-0.nightly-2022-02-23-182709 Cluster Platform: How reproducible: 100% Steps to Reproduce (in detail): 1. create Non-SAN certs mkdir test_customized_cert_no_san cd test_customized_cert_no_san ### generate root CA openssl genrsa -out caKey.pem 2048 openssl req -x509 -new -nodes -key caKey.pem -days 100000 -out caCert.pem -subj "/CN=network_edge_test_ca" ### generate server key and certs openssl genrsa -out serverKey.pem 2048 cat > server_no_san.conf << EOF [req] req_extensions = v3_req distinguished_name = req_distinguished_name [req_distinguished_name] [ v3_req ] basicConstraints = CA:FALSE keyUsage = nonRepudiation, digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth EOF DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}') openssl req -new -key serverKey.pem -out serverNoSAN.csr -subj "/CN=*.$DOMAIN" -config server_no_san.conf openssl x509 -req -in serverNoSAN.csr -CA caCert.pem -CAkey caKey.pem -CAcreateserial -out serverCertNoSAN.pem -days 100000 -extensions v3_req -extfile server_no_san.conf 2. create secret for ingress and replace default ingress certs oc --namespace openshift-ingress create secret tls custom-certs-default --cert=serverCertNoSAN.pem --key=serverKey.pem oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}' 3. create configmap and replace trustedCA (custom PKI) oc create configmap user-ca-bundle --from-file=ca-bundle.crt=caCert.pem -n openshift-config oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' --type=merge 4. wait until all nodes/operators updated with trustedCA. Actual results: No operator shows Upgradeable as False no any messages to remind user before upgrade to 4.10 Expected results: ingress operator should report Upgradeable: False when Non-SAN certs are used to remind user before they upgrade to 4.10 Impact of the problem: the cluster will be broken after upgrade to 4.10. repeat above test steps on 4.10 cluster, and got below error from auth/console: authentication 4.10.0-0.nightly-2022-02-22-093600 False False True 45m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-410.qe.devcluster.openshift.com/healthz": x509: certificate relies on legacy Common Name field, use SANs instead console 4.10.0-0.nightly-2022-02-22-093600 False True False 45m DeploymentAvailable: 0 replicas available for console deployment RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-410.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-410.qe.devcluster.openshift.com": x509: certificate relies on legacy Common Name field, use SANs instead Additional info: more info from componentRoute bugs https://bugzilla.redhat.com/show_bug.cgi?id=2055494 https://bugzilla.redhat.com/show_bug.cgi?id=2052467 See https://bugzilla.redhat.com/show_bug.cgi?id=2037274#c0 for background ** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.
Adding a kindly reminder note: When verifying the bug especially in 4.9.z, we should cover two checkpoints: One is as said in above "Expected results". The other is, when user updates it with a cert with SAN, it should be back to be normal to unblock user from upgrading.
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-02-27-122819 True False 50m Error while reconciling 4.11.0-0.nightly-2022-02-27-122819: some cluster operators have not yet rolled out melvinjoseph@mjoseph-mac test_customized_cert_no_san % mkdir test_customized_cert_no_san cd test_customized_cert_no_san melvinjoseph@mjoseph-mac test_customized_cert_no_san % export KUBECONFIG=../kubeconfig melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl genrsa -out caKey.pem 2048 Generating RSA private key, 2048 bit long modulus ......+++ ............+++ e is 65537 (0x10001) melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl req -x509 -new -nodes -key caKey.pem -days 100000 -out caCert.pem -subj "/CN=network_edge_test_ca" melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl genrsa -out serverKey.pem 2048 Generating RSA private key, 2048 bit long modulus ....................+++ .................+++ e is 65537 (0x10001) melvinjoseph@mjoseph-mac test_customized_cert_no_san % cat > server_no_san.conf << EOF [req] req_extensions = v3_req distinguished_name = req_distinguished_name [req_distinguished_name] [ v3_req ] basicConstraints = CA:FALSE keyUsage = nonRepudiation, digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth EOF melvinjoseph@mjoseph-mac test_customized_cert_no_san % DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}') melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl req -new -key serverKey.pem -out serverNoSAN.csr -subj "/CN=*.$DOMAIN" -config server_no_san.conf melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl x509 -req -in serverNoSAN.csr -CA caCert.pem -CAkey caKey.pem -CAcreateserial -out serverCertNoSAN.pem -days 100000 -extensions v3_req -extfile server_no_san.conf Signature ok subject=/CN=*.apps.mjoseph-07621.qe.devcluster.openshift.com Getting CA Private Key melvinjoseph@mjoseph-mac test_customized_cert_no_san % melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc --namespace openshift-ingress create secret tls custom-certs-default --cert=serverCertNoSAN.pem --key=serverKey.pem secret/custom-certs-default created melvinjoseph@mjoseph-mac test_customized_cert_no_san % melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}' ingresscontroller.operator.openshift.io/default patched melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc create configmap user-ca-bundle --from-file=ca-bundle.crt=caCert.pem -n openshift-config configmap/user-ca-bundle created melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' --type=merge proxy.config.openshift.io/cluster patched melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-02-27-122819 False False True 20m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.mjoseph-07621.qe.devcluster.openshift.com/healthz": x509: certificate relies on legacy Common Name field, use SANs instead baremetal 4.11.0-0.nightly-2022-02-27-122819 True False False 69m cloud-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 71m cloud-credential 4.11.0-0.nightly-2022-02-27-122819 True False False 71m cluster-autoscaler 4.11.0-0.nightly-2022-02-27-122819 True False False 69m config-operator 4.11.0-0.nightly-2022-02-27-122819 True False False 70m console 4.11.0-0.nightly-2022-02-27-122819 False True False 20m DeploymentAvailable: 0 replicas available for console deployment... csi-snapshot-controller 4.11.0-0.nightly-2022-02-27-122819 True False False 70m dns 4.11.0-0.nightly-2022-02-27-122819 True False False 69m etcd 4.11.0-0.nightly-2022-02-27-122819 True False False 68m image-registry 4.11.0-0.nightly-2022-02-27-122819 True False False 62m ingress 4.11.0-0.nightly-2022-02-27-122819 True False False 20m insights 4.11.0-0.nightly-2022-02-27-122819 True False False 63m kube-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 57m kube-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 66m kube-scheduler 4.11.0-0.nightly-2022-02-27-122819 True False False 65m kube-storage-version-migrator 4.11.0-0.nightly-2022-02-27-122819 True False False 70m machine-api 4.11.0-0.nightly-2022-02-27-122819 True False False 66m machine-approver 4.11.0-0.nightly-2022-02-27-122819 True False False 70m machine-config 4.11.0-0.nightly-2022-02-27-122819 True False False 69m marketplace 4.11.0-0.nightly-2022-02-27-122819 True False False 69m monitoring 4.11.0-0.nightly-2022-02-27-122819 True False False 49m network 4.11.0-0.nightly-2022-02-27-122819 True False False 70m node-tuning 4.11.0-0.nightly-2022-02-27-122819 True False False 69m openshift-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 58m openshift-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 66m openshift-samples 4.11.0-0.nightly-2022-02-27-122819 True False False 62m operator-lifecycle-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 69m operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-02-27-122819 True False False 70m operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-02-27-122819 True False False 63m service-ca 4.11.0-0.nightly-2022-02-27-122819 True False False 70m storage 4.11.0-0.nightly-2022-02-27-122819 True False False 70m melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get co ingress -o json | jq .status.conditions [ { "lastTransitionTime": "2022-02-28T12:26:27Z", "message": "The \"default\" ingress controller reports Available=True.", "reason": "IngressAvailable", "status": "True", "type": "Available" }, { "lastTransitionTime": "2022-02-28T12:26:27Z", "message": "desired and current number of IngressControllers are equal", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2022-02-28T11:43:57Z", "message": "The \"default\" ingress controller reports Degraded=False.", "reason": "IngressNotDegraded", "status": "False", "type": "Degraded" }, { "lastTransitionTime": "2022-02-28T12:25:53Z", "message": "Some ingresscontrollers are not upgradeable: ingresscontroller \"default\" is not upgradeable: OperandsNotUpgradeable: One or more managed resources are not upgradeable: certificate in secret openshift-ingress/custom-certs-default has legacy Common Name (CN) but has no Subject Alternative Name (SAN) for domain: *.apps.mjoseph-07621.qe.devcluster.openshift.com", "reason": "IngressControllersNotUpgradeable", "status": "False", "type": "Upgradeable" } ] First Part verified. melvinjoseph@mjoseph-mac Downloads % mkdir tmp_dir melvinjoseph@mjoseph-mac Downloads % cd tmp_dir melvinjoseph@mjoseph-mac tmp_dir % curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/ca.key curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/ca.pem curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/openssl.conf melvinjoseph@mjoseph-mac tmp_dir % ls ca.key ca.pem openssl.conf melvinjoseph@mjoseph-mac tmp_dir % sed -i.bak "s/example.com/${domain}/g" openssl.conf melvinjoseph@mjoseph-mac tmp_dir % vi openssl.conf melvinjoseph@mjoseph-mac tmp_dir % openssl genrsa -out apps.key 2048 Generating RSA private key, 2048 bit long modulus ..............................+++ .....+++ e is 65537 (0x10001) melvinjoseph@mjoseph-mac tmp_dir % openssl req -new -config openssl.conf -key apps.key -out apps.csr melvinjoseph@mjoseph-mac tmp_dir % openssl x509 -req -CA ca.pem -CAkey ca.key -CAcreateserial -extfile openssl.conf -extensions v3_req -in apps.csr -out apps.crt -days 3650 Signature ok subject=/C=US/ST=VA/L=Somewhere/O=RedHat/OU=OpenShift QE/CN=apps Getting CA Private Key melvinjoseph@mjoseph-mac tmp_dir % openssl x509 -text -noout -in apps.crt | grep "Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:*.apps.mjoseph-07621.qe.devcluster.openshift.com melvinjoseph@mjoseph-mac Downloads % oc --namespace openshift-ingress create secret tls custom-certs-default-san --cert=tmp_dir/apps.crt --key=tmp_dir/apps.key secret/custom-certs-default-san created melvinjoseph@mjoseph-mac Downloads % oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default-san"}}}' ingresscontroller.operator.openshift.io/default patched melvinjoseph@mjoseph-mac Downloads % oc create configmap user-ca-bundle1 --from-file=ca-bundle.crt=tmp_dir/ca.pem -n openshift-config configmap/user-ca-bundle1 created melvinjoseph@mjoseph-mac Downloads % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle1"}}}' --type=merge proxy.config.openshift.io/cluster patched melvinjoseph@mjoseph-mac Downloads % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' --type=merge melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress get secret NAME TYPE DATA AGE builder-dockercfg-k6pzz kubernetes.io/dockercfg 1 81m builder-token-vsbqd kubernetes.io/service-account-token 4 81m builder-token-ww9q6 kubernetes.io/service-account-token 4 81m custom-certs-default kubernetes.io/tls 2 35m custom-certs-default-san kubernetes.io/tls 2 3m18s default-dockercfg-dch74 kubernetes.io/dockercfg 1 81m default-token-hdvpb kubernetes.io/service-account-token 4 84m default-token-lsdjj kubernetes.io/service-account-token 4 81m deployer-dockercfg-dr58p kubernetes.io/dockercfg 1 81m deployer-token-64jpv kubernetes.io/service-account-token 4 81m deployer-token-qkrn9 kubernetes.io/service-account-token 4 81m router-dockercfg-gmg5k kubernetes.io/dockercfg 1 81m router-metrics-certs-default kubernetes.io/tls 2 84m router-stats-default Opaque 2 84m router-token-ncsxr kubernetes.io/service-account-token 4 84m router-token-x42b2 kubernetes.io/service-account-token 4 81m melvinjoseph@mjoseph-mac Downloads % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-02-27-122819 True False False 29m baremetal 4.11.0-0.nightly-2022-02-27-122819 True False False 111m cloud-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 113m cloud-credential 4.11.0-0.nightly-2022-02-27-122819 True False False 113m cluster-autoscaler 4.11.0-0.nightly-2022-02-27-122819 True False False 110m config-operator 4.11.0-0.nightly-2022-02-27-122819 True False False 112m console 4.11.0-0.nightly-2022-02-27-122819 True False False 28m csi-snapshot-controller 4.11.0-0.nightly-2022-02-27-122819 True False False 111m dns 4.11.0-0.nightly-2022-02-27-122819 True False False 110m etcd 4.11.0-0.nightly-2022-02-27-122819 True False False 110m image-registry 4.11.0-0.nightly-2022-02-27-122819 True False False 104m ingress 4.11.0-0.nightly-2022-02-27-122819 True False False 61m insights 4.11.0-0.nightly-2022-02-27-122819 True False False 104m kube-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 99m kube-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 108m kube-scheduler 4.11.0-0.nightly-2022-02-27-122819 True False False 106m kube-storage-version-migrator 4.11.0-0.nightly-2022-02-27-122819 True False False 22m machine-api 4.11.0-0.nightly-2022-02-27-122819 True False False 108m machine-approver 4.11.0-0.nightly-2022-02-27-122819 True False False 111m machine-config 4.11.0-0.nightly-2022-02-27-122819 True False False 110m marketplace 4.11.0-0.nightly-2022-02-27-122819 True False False 111m monitoring 4.11.0-0.nightly-2022-02-27-122819 True False False 90m network 4.11.0-0.nightly-2022-02-27-122819 True False False 112m node-tuning 4.11.0-0.nightly-2022-02-27-122819 True False False 111m openshift-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 99m openshift-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 108m openshift-samples 4.11.0-0.nightly-2022-02-27-122819 True False False 104m operator-lifecycle-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 111m operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-02-27-122819 True False False 111m operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-02-27-122819 True False False 104m service-ca 4.11.0-0.nightly-2022-02-27-122819 True False False 112m storage 4.11.0-0.nightly-2022-02-27-122819 True False False 111m melvinjoseph@mjoseph-mac Downloads % oc get co ingress -o json | jq .status.conditions [ { "lastTransitionTime": "2022-02-28T12:26:27Z", "message": "The \"default\" ingress controller reports Available=True.", "reason": "IngressAvailable", "status": "True", "type": "Available" }, { "lastTransitionTime": "2022-02-28T12:26:27Z", "message": "desired and current number of IngressControllers are equal", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2022-02-28T11:43:57Z", "message": "The \"default\" ingress controller reports Degraded=False.", "reason": "IngressNotDegraded", "status": "False", "type": "Degraded" }, { "lastTransitionTime": "2022-02-28T12:58:25Z", "reason": "IngressControllersUpgradeable", "status": "True", "type": "Upgradeable" } ] It is back for upgrade, hence verified.
I'm adding UpgradeBlocker, and per [1], asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? * example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet * example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? * example: Up to 2 minute disruption in edge routing * example: Up to 90 seconds of API downtime * example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? * example: Issue resolves itself after five minutes * example: Admin uses oc to fix things * example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? * example: No, it has always been like this we just never noticed * example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 [1]: https://github.com/openshift/enhancements/blob/2911c46bf7d2f22eb1ab81739b4f9c2603fd0c07/enhancements/update/update-blocker-lifecycle/README.md
(In reply to W. Trevor King from comment #10) > Who is impacted? If we have to block upgrade edges based on this issue, > which edges would need blocking? The best approximation is that this issue affects an unknown percentage of customers upgrading from 4.9.z to 4.10. However, the issue doesn't strictly correlate to OpenShift versions. Rather, the issue affects all clients (OpenShift or otherwise) that use Go 1.17's crypto/tls package to connect to routes that use the ingress default certificate if this certificate does not have a SAN. Affected clients include OpenShift 4.10 operators, such as the authentication and console operators, that perform health checks against their respective routes; OpenShift 4.10 is built using Go 1.17, so we can anticipate that this issue will affect upgrades from 4.9.z to 4.10 for clusters that have problematic default certificates. However, affected clients could also potentially include user workload, and we have no way of detecting or guarding against users' using Go 1.17-based clients to connect to routes with OpenShift 4.9 or earlier. > What is the impact? Is it serious enough to warrant blocking edges? Go 1.17-based clients will fail to connect to routes that use the default certificate if it has no SAN. > How involved is remediation (even moderately serious impacts might be > acceptable if they are easy to mitigate)? The cluster admin must replace the default certificate with one that specifies a SAN. > Is this a regression (if all previous versions were also vulnerable, > updating to the new, vulnerable version does not increase exposure)? This pertains to a deprecation in Go's crypto/tls package. The issue could be considered a regression in some OpenShift 4.10 components built using Go 1.17. This BZ does not fix the regression/deprecation but rather warns the cluster admin if the default certificate would cause problems for affected components or clients.
Rounding internally, folks are ok calling this out in docs like [1] and not blocking updates, delaying 4.10's GA, calling this out in a KCS Solution, or any of the other possible mitigations. So I'm removing the UpgradeBlocker tagging to accept this as "not a blocker". [1]: https://github.com/openshift/openshift-docs/pull/41872
I am setting "No Doc Update" on this BZ because the only reason for adding the change in 4.11 was so that we could backport it to 4.9. I am adding appropriate doc text to the 4.9 BZ, bug 2060111.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069