Bug 2040521

Summary: RouterCertsDegraded certificate could not validate route hostname v4-0-config-system-custom-router-certs.apps
Product: OpenShift Container Platform Reporter: Andrew Collins <ancollin>
Component: apiserver-authAssignee: Krzysztof Ostrowski <kostrows>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: high    
Version: 4.9CC: aos-bugs, chrzhang, hongli, kgordeev, kostrows, ksathe, mfojtik, mjoseph, nachahua, surbania, wlewis, xxia, ytripath
Target Milestone: ---Flags: ancollin: needinfo-
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2077483 (view as bug list) Environment:
Last Closed: 2022-03-10 16:39:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2077483    

Description Andrew Collins 2022-01-13 22:29:41 UTC
Description of problem:
While upgrading a cluster from 4.8.24 to 4.9.12, the upgrade stalls when the authentication clusteroperator goes to Degraded.
The error is
```
RouterCertsDegraded:
        secret/v4-0-config-system-router-certs.spec.data[apps.dev-alln-01.cae.cisco.com]
        -n openshift-authentication: certificate could not validate route
        hostname
        v4-0-config-system-custom-router-certs.apps.dev-alln-01.cae.cisco.com:
```

Version-Release number of selected component (if applicable):
4.9.12

How reproducible:
Reproducible

Steps to Reproduce:
1. Configure a custom ingress certificate without wildcard on a 4.8.24 cluster.
2. Upgrade to 4.9

Actual results:
authentication clusteroperator goes to Degraded state.


Expected results:
Upgrade completes successfully.


Additional info:
The PR that introduced these changes is: https://github.com/openshift/cluster-authentication-operator/pull/430/files#diff-0d623dfd885adb20f991bda4c2453aebd732ca6dbb4d1d4be6e79805c3b48de6R311

See RH Support Case for must-gather: 03124512

Comment 1 Krzysztof Ostrowski 2022-01-14 17:33:39 UTC
Ok, I might have found it:

https://github.com/openshift/cluster-authentication-operator/commit/7c29d664bd571ce5f8e99456a206584651d200a7#diff-0d623dfd885ad[…]e79805c3b48de6R311

"v4-0-config-system-custom-router-certs"  is the last argument in pkg/operator/starter.go

https://github.com/openshift/cluster-authentication-operator/commit/7c29d664bd571ce5f8e99456a206584651d200a7#diff-efa5ab900a24c[…]ffcf43ad19a67f0R59

but in the constructor we expect it as the one before the last.

-> we set the routeName as customSecretName and vis a versa:

		"v4-0-config-system-router-certs",
		"oauth-openshift",
		"v4-0-config-system-custom-router-certs",
should be
		"v4-0-config-system-router-certs",
		"v4-0-config-system-custom-router-certs",
		"oauth-openshift",

Comment 2 Andrew Collins 2022-01-14 17:49:07 UTC
Yes, I was thinking the same.

Comment 4 Krzysztof Ostrowski 2022-01-17 10:09:56 UTC
@ancollin mentioned that he has a workaround. Set to severity medium.

Pull request created with solution, but still looking for a good way to prevent it from happening again.

https://github.com/openshift/cluster-authentication-operator/pull/533

Comment 9 Yash Tripathi 2022-02-04 02:29:30 UTC
Verified on upgrade from ocp-release:4.8.24-x86_64 to 4.9.0-0.nightly-2022-02-02-193336
1. Created and applied a custom certificate with Issuer: C = US, ST = NY, O = Local Developement, L = Local Developement, CN = oauth-openshift.apps.<cluster-name>.openshift.com, subjectAltName = DNS:oauth-openshift.apps.<cluster-name>.openshift.com, OU = Local Developement
2. Upgraded cluster from ocp-release:4.8.24-x86_64 to 4.9.0-0.nightly-2022-02-02-193336

Actual Results:
Upgrade completes successfully

$ oc get co
authentication                             4.9.0-0.nightly-2022-02-02-193336   True        False         False      6h44m

Expected Results:
Upgrade completes successfully

Comment 11 Andrew Collins 2022-02-21 21:18:06 UTC
Just for the sake of documentation: The "workaround" I had here was to add the "v4-0.*" route as a Subject-Alternative-Name on the certificate we were using for the ingress router. In this case, we already had all of the platform routes added as SANs, since we route application routes through different IngressControllers.

Comment 15 Neyder Achahuanco Apaza 2022-02-22 17:22:42 UTC
Hi could be this helpful,

Installation on $.8 doesn't give a problem with a bare certificate. but on 4.9 it does:

This fixed it: https://access.redhat.com/solutions/4542531

Comment 21 errata-xmlrpc 2022-03-10 16:39:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 25 Red Hat Bugzilla 2023-09-18 04:30:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days