Bug 2057762
| Summary: | ingress operator should report Upgradeable False to remind user before upgrade to 4.10 when Non-SAN certs are used | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> |
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
| Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | hongli, mfisher, mmasters, wking, xxia |
| Version: | 4.9 | Keywords: | Upgrades |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:50:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2059210 | ||
|
Description
Hongan Li
2022-02-24 03:22:27 UTC
Adding a kindly reminder note: When verifying the bug especially in 4.9.z, we should cover two checkpoints: One is as said in above "Expected results". The other is, when user updates it with a cert with SAN, it should be back to be normal to unblock user from upgrading. melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-02-27-122819 True False 50m Error while reconciling 4.11.0-0.nightly-2022-02-27-122819: some cluster operators have not yet rolled out
melvinjoseph@mjoseph-mac test_customized_cert_no_san % mkdir test_customized_cert_no_san
cd test_customized_cert_no_san
melvinjoseph@mjoseph-mac test_customized_cert_no_san % export KUBECONFIG=../kubeconfig
melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl genrsa -out caKey.pem 2048
Generating RSA private key, 2048 bit long modulus
......+++
............+++
e is 65537 (0x10001)
melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl req -x509 -new -nodes -key caKey.pem -days 100000 -out caCert.pem -subj "/CN=network_edge_test_ca"
melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl genrsa -out serverKey.pem 2048
Generating RSA private key, 2048 bit long modulus
....................+++
.................+++
e is 65537 (0x10001)
melvinjoseph@mjoseph-mac test_customized_cert_no_san % cat > server_no_san.conf << EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth
EOF
melvinjoseph@mjoseph-mac test_customized_cert_no_san % DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl req -new -key serverKey.pem -out serverNoSAN.csr -subj "/CN=*.$DOMAIN" -config server_no_san.conf
melvinjoseph@mjoseph-mac test_customized_cert_no_san % openssl x509 -req -in serverNoSAN.csr -CA caCert.pem -CAkey caKey.pem -CAcreateserial -out serverCertNoSAN.pem -days 100000 -extensions v3_req -extfile server_no_san.conf
Signature ok
subject=/CN=*.apps.mjoseph-07621.qe.devcluster.openshift.com
Getting CA Private Key
melvinjoseph@mjoseph-mac test_customized_cert_no_san %
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc --namespace openshift-ingress create secret tls custom-certs-default --cert=serverCertNoSAN.pem --key=serverKey.pem
secret/custom-certs-default created
melvinjoseph@mjoseph-mac test_customized_cert_no_san %
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
--patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
ingresscontroller.operator.openshift.io/default patched
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc create configmap user-ca-bundle --from-file=ca-bundle.crt=caCert.pem -n openshift-config
configmap/user-ca-bundle created
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' --type=merge
proxy.config.openshift.io/cluster patched
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.0-0.nightly-2022-02-27-122819 False False True 20m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.mjoseph-07621.qe.devcluster.openshift.com/healthz": x509: certificate relies on legacy Common Name field, use SANs instead
baremetal 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
cloud-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 71m
cloud-credential 4.11.0-0.nightly-2022-02-27-122819 True False False 71m
cluster-autoscaler 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
config-operator 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
console 4.11.0-0.nightly-2022-02-27-122819 False True False 20m DeploymentAvailable: 0 replicas available for console deployment...
csi-snapshot-controller 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
dns 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
etcd 4.11.0-0.nightly-2022-02-27-122819 True False False 68m
image-registry 4.11.0-0.nightly-2022-02-27-122819 True False False 62m
ingress 4.11.0-0.nightly-2022-02-27-122819 True False False 20m
insights 4.11.0-0.nightly-2022-02-27-122819 True False False 63m
kube-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 57m
kube-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 66m
kube-scheduler 4.11.0-0.nightly-2022-02-27-122819 True False False 65m
kube-storage-version-migrator 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
machine-api 4.11.0-0.nightly-2022-02-27-122819 True False False 66m
machine-approver 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
machine-config 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
marketplace 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
monitoring 4.11.0-0.nightly-2022-02-27-122819 True False False 49m
network 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
node-tuning 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
openshift-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 58m
openshift-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 66m
openshift-samples 4.11.0-0.nightly-2022-02-27-122819 True False False 62m
operator-lifecycle-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 69m
operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-02-27-122819 True False False 63m
service-ca 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
storage 4.11.0-0.nightly-2022-02-27-122819 True False False 70m
melvinjoseph@mjoseph-mac test_customized_cert_no_san % oc get co ingress -o json | jq .status.conditions
[
{
"lastTransitionTime": "2022-02-28T12:26:27Z",
"message": "The \"default\" ingress controller reports Available=True.",
"reason": "IngressAvailable",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2022-02-28T12:26:27Z",
"message": "desired and current number of IngressControllers are equal",
"reason": "AsExpected",
"status": "False",
"type": "Progressing"
},
{
"lastTransitionTime": "2022-02-28T11:43:57Z",
"message": "The \"default\" ingress controller reports Degraded=False.",
"reason": "IngressNotDegraded",
"status": "False",
"type": "Degraded"
},
{
"lastTransitionTime": "2022-02-28T12:25:53Z",
"message": "Some ingresscontrollers are not upgradeable: ingresscontroller \"default\" is not upgradeable: OperandsNotUpgradeable: One or more managed resources are not upgradeable: certificate in secret openshift-ingress/custom-certs-default has legacy Common Name (CN) but has no Subject Alternative Name (SAN) for domain: *.apps.mjoseph-07621.qe.devcluster.openshift.com",
"reason": "IngressControllersNotUpgradeable",
"status": "False",
"type": "Upgradeable"
}
]
First Part verified.
melvinjoseph@mjoseph-mac Downloads % mkdir tmp_dir
melvinjoseph@mjoseph-mac Downloads % cd tmp_dir
melvinjoseph@mjoseph-mac tmp_dir % curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/ca.key
curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/ca.pem
curl -O -sS https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/openssl.conf
melvinjoseph@mjoseph-mac tmp_dir % ls
ca.key ca.pem openssl.conf
melvinjoseph@mjoseph-mac tmp_dir % sed -i.bak "s/example.com/${domain}/g" openssl.conf
melvinjoseph@mjoseph-mac tmp_dir % vi openssl.conf
melvinjoseph@mjoseph-mac tmp_dir % openssl genrsa -out apps.key 2048
Generating RSA private key, 2048 bit long modulus
..............................+++
.....+++
e is 65537 (0x10001)
melvinjoseph@mjoseph-mac tmp_dir % openssl req -new -config openssl.conf -key apps.key -out apps.csr
melvinjoseph@mjoseph-mac tmp_dir % openssl x509 -req -CA ca.pem -CAkey ca.key -CAcreateserial -extfile openssl.conf -extensions v3_req -in apps.csr -out apps.crt -days 3650
Signature ok
subject=/C=US/ST=VA/L=Somewhere/O=RedHat/OU=OpenShift QE/CN=apps
Getting CA Private Key
melvinjoseph@mjoseph-mac tmp_dir % openssl x509 -text -noout -in apps.crt | grep "Alternative Name" -A 1
X509v3 Subject Alternative Name:
DNS:*.apps.mjoseph-07621.qe.devcluster.openshift.com
melvinjoseph@mjoseph-mac Downloads % oc --namespace openshift-ingress create secret tls custom-certs-default-san --cert=tmp_dir/apps.crt --key=tmp_dir/apps.key
secret/custom-certs-default-san created
melvinjoseph@mjoseph-mac Downloads % oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
--patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default-san"}}}'
ingresscontroller.operator.openshift.io/default patched
melvinjoseph@mjoseph-mac Downloads % oc create configmap user-ca-bundle1 --from-file=ca-bundle.crt=tmp_dir/ca.pem -n openshift-config
configmap/user-ca-bundle1 created
melvinjoseph@mjoseph-mac Downloads % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle1"}}}' --type=merge
proxy.config.openshift.io/cluster patched
melvinjoseph@mjoseph-mac Downloads % oc patch proxy/cluster --patch '{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}' --type=merge
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-ingress get secret
NAME TYPE DATA AGE
builder-dockercfg-k6pzz kubernetes.io/dockercfg 1 81m
builder-token-vsbqd kubernetes.io/service-account-token 4 81m
builder-token-ww9q6 kubernetes.io/service-account-token 4 81m
custom-certs-default kubernetes.io/tls 2 35m
custom-certs-default-san kubernetes.io/tls 2 3m18s
default-dockercfg-dch74 kubernetes.io/dockercfg 1 81m
default-token-hdvpb kubernetes.io/service-account-token 4 84m
default-token-lsdjj kubernetes.io/service-account-token 4 81m
deployer-dockercfg-dr58p kubernetes.io/dockercfg 1 81m
deployer-token-64jpv kubernetes.io/service-account-token 4 81m
deployer-token-qkrn9 kubernetes.io/service-account-token 4 81m
router-dockercfg-gmg5k kubernetes.io/dockercfg 1 81m
router-metrics-certs-default kubernetes.io/tls 2 84m
router-stats-default Opaque 2 84m
router-token-ncsxr kubernetes.io/service-account-token 4 84m
router-token-x42b2 kubernetes.io/service-account-token 4 81m
melvinjoseph@mjoseph-mac Downloads % oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.0-0.nightly-2022-02-27-122819 True False False 29m
baremetal 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
cloud-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 113m
cloud-credential 4.11.0-0.nightly-2022-02-27-122819 True False False 113m
cluster-autoscaler 4.11.0-0.nightly-2022-02-27-122819 True False False 110m
config-operator 4.11.0-0.nightly-2022-02-27-122819 True False False 112m
console 4.11.0-0.nightly-2022-02-27-122819 True False False 28m
csi-snapshot-controller 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
dns 4.11.0-0.nightly-2022-02-27-122819 True False False 110m
etcd 4.11.0-0.nightly-2022-02-27-122819 True False False 110m
image-registry 4.11.0-0.nightly-2022-02-27-122819 True False False 104m
ingress 4.11.0-0.nightly-2022-02-27-122819 True False False 61m
insights 4.11.0-0.nightly-2022-02-27-122819 True False False 104m
kube-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 99m
kube-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 108m
kube-scheduler 4.11.0-0.nightly-2022-02-27-122819 True False False 106m
kube-storage-version-migrator 4.11.0-0.nightly-2022-02-27-122819 True False False 22m
machine-api 4.11.0-0.nightly-2022-02-27-122819 True False False 108m
machine-approver 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
machine-config 4.11.0-0.nightly-2022-02-27-122819 True False False 110m
marketplace 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
monitoring 4.11.0-0.nightly-2022-02-27-122819 True False False 90m
network 4.11.0-0.nightly-2022-02-27-122819 True False False 112m
node-tuning 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
openshift-apiserver 4.11.0-0.nightly-2022-02-27-122819 True False False 99m
openshift-controller-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 108m
openshift-samples 4.11.0-0.nightly-2022-02-27-122819 True False False 104m
operator-lifecycle-manager 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-02-27-122819 True False False 104m
service-ca 4.11.0-0.nightly-2022-02-27-122819 True False False 112m
storage 4.11.0-0.nightly-2022-02-27-122819 True False False 111m
melvinjoseph@mjoseph-mac Downloads % oc get co ingress -o json | jq .status.conditions
[
{
"lastTransitionTime": "2022-02-28T12:26:27Z",
"message": "The \"default\" ingress controller reports Available=True.",
"reason": "IngressAvailable",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2022-02-28T12:26:27Z",
"message": "desired and current number of IngressControllers are equal",
"reason": "AsExpected",
"status": "False",
"type": "Progressing"
},
{
"lastTransitionTime": "2022-02-28T11:43:57Z",
"message": "The \"default\" ingress controller reports Degraded=False.",
"reason": "IngressNotDegraded",
"status": "False",
"type": "Degraded"
},
{
"lastTransitionTime": "2022-02-28T12:58:25Z",
"reason": "IngressControllersUpgradeable",
"status": "True",
"type": "Upgradeable"
}
]
It is back for upgrade, hence verified.
I'm adding UpgradeBlocker, and per [1], asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? * example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet * example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? * example: Up to 2 minute disruption in edge routing * example: Up to 90 seconds of API downtime * example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? * example: Issue resolves itself after five minutes * example: Admin uses oc to fix things * example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? * example: No, it has always been like this we just never noticed * example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1 [1]: https://github.com/openshift/enhancements/blob/2911c46bf7d2f22eb1ab81739b4f9c2603fd0c07/enhancements/update/update-blocker-lifecycle/README.md (In reply to W. Trevor King from comment #10) > Who is impacted? If we have to block upgrade edges based on this issue, > which edges would need blocking? The best approximation is that this issue affects an unknown percentage of customers upgrading from 4.9.z to 4.10. However, the issue doesn't strictly correlate to OpenShift versions. Rather, the issue affects all clients (OpenShift or otherwise) that use Go 1.17's crypto/tls package to connect to routes that use the ingress default certificate if this certificate does not have a SAN. Affected clients include OpenShift 4.10 operators, such as the authentication and console operators, that perform health checks against their respective routes; OpenShift 4.10 is built using Go 1.17, so we can anticipate that this issue will affect upgrades from 4.9.z to 4.10 for clusters that have problematic default certificates. However, affected clients could also potentially include user workload, and we have no way of detecting or guarding against users' using Go 1.17-based clients to connect to routes with OpenShift 4.9 or earlier. > What is the impact? Is it serious enough to warrant blocking edges? Go 1.17-based clients will fail to connect to routes that use the default certificate if it has no SAN. > How involved is remediation (even moderately serious impacts might be > acceptable if they are easy to mitigate)? The cluster admin must replace the default certificate with one that specifies a SAN. > Is this a regression (if all previous versions were also vulnerable, > updating to the new, vulnerable version does not increase exposure)? This pertains to a deprecation in Go's crypto/tls package. The issue could be considered a regression in some OpenShift 4.10 components built using Go 1.17. This BZ does not fix the regression/deprecation but rather warns the cluster admin if the default certificate would cause problems for affected components or clients. Rounding internally, folks are ok calling this out in docs like [1] and not blocking updates, delaying 4.10's GA, calling this out in a KCS Solution, or any of the other possible mitigations. So I'm removing the UpgradeBlocker tagging to accept this as "not a blocker". [1]: https://github.com/openshift/openshift-docs/pull/41872 I am setting "No Doc Update" on this BZ because the only reason for adding the change in 4.11 was so that we could backport it to 4.9. I am adding appropriate doc text to the 4.9 BZ, bug 2060111. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |