Bug 2052467 - Customized component route with cert of no SAN does not mark Upgradeable as False to remind user before upgrade to 4.10
Summary: Customized component route with cert of no SAN does not mark Upgradeable as F...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.9.z
Assignee: Pierre Prinetti
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On: 2031839
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-09 10:41 UTC by Xingxing Xia
Modified: 2022-08-04 13:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-05 22:04:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 545 0 None open WIP: Bug 2052467: HTTPS certificate validation to check for SAN 2022-02-23 08:15:11 UTC
Red Hat Product Errata RHBA-2022:5434 0 None None None 2022-07-05 22:04:31 UTC

Description Xingxing Xia 2022-02-09 10:41:05 UTC
Description of problem:
Customized component route with cert of no SAN does not mark Upgradeable as False to remind user before upgrade to 4.10, see bug 2037274 for background.

Background: When testing bug 2037274, we need cover the scenario in which a 4.9 env has some customized component route with cert of no SAN. I tried oauth-openshift route https://docs.openshift.com/container-platform/4.9/authentication/configuring-internal-oauth.html#customizing-the-oauth-server-url_configuring-internal-oauth with cert of no SAN, the customization is verified via web console login, but no operators are marked with Upgradeable as False to remind user. User should be reminded like other scenario like comment of bug 2037274#c10 . Confirmed with Dev https://coreos.slack.com/archives/CS05TR7BK/p1644401923172849?thread_ts=1644337510.355279&cid=CS05TR7BK , a separate bug is needed, so opening with this tracker.

OpenShift release version:
4.9.0-0.nightly-2022-02-09-030305

How reproducible:
Always

Steps to Reproduce (in detail):
1. Prepare cert without SAN:
mkdir test_customized_oauth_cert_no_san
cd test_customized_oauth_cert_no_san
openssl genrsa -out caKey.pem 2048
openssl req -x509 -new -nodes -key caKey.pem -days 100000 -out caCert.pem -subj "/CN=xxia_test_ca"
openssl genrsa -out serverKey.pem 2048
cat > server_no_san.conf << EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth
EOF

CUSTOM_DOMAIN=qe1.SNIPPED.com
openssl req -new -key serverKey.pem -out serverNoSAN.csr -subj "/CN=*.$CUSTOM_DOMAIN" -config server_no_san.conf
openssl x509 -req -in serverNoSAN.csr -CA caCert.pem -CAkey caKey.pem -CAcreateserial -out serverCertNoSAN.pem -days 100000 -extensions v3_req -extfile server_no_san.conf

2. Make the cert used in oauth route server:
NOTE: the customized route `auth-openshift-custom.CUSTOM_DOMAIN` need to be resolvable. We can add an A record in route53
1) Open https://console.aws.amazon.com/route53/home?region=us-east-2
2) In Hosted Zones, click on the item of CUSTOM_DOMAIN, it is already there created for team use.
3) Click 'Create Record Set'
- Name: your customized hostname, eg: auth-openshift-custom.CUSTOM_DOMAIN
  Type: A IPv4 Address
  Value: the IP address where our route can be resolved, you can get from `nslookup <default_oauth_route_hostname>`
For example:
$ nslookup oauth-openshift.apps.YOUR_ENV_SUFFIX
...
Non-authoritative answer:
Name:    oauth-openshift.apps.YOUR_ENV_SUFFIX
Address: 18.189....
...
4) Save your changes

oc --namespace openshift-config create secret tls custom-auth-component --cert=serverCertNoSAN.pem --key=serverKey.pem
oc edit ingress.config cluster
...
spec:
  componentRoutes:
  - name: oauth-openshift
    namespace: openshift-authentication
    hostname: auth-openshift-custom.CUSTOM_DOMAIN # replace with above CUSTOM_DOMAIN value
    servingCertKeyPairSecret:
      name: custom-auth-component
...

This will cause oauth pods and KAS pods renew. Wait a moment for the renew to finish.

3. Login to console, it redirects to auth-openshift-custom.CUSTOM_DOMAIN. Input user and password, login succeeded, this verifies the oauth-openshift cert and route works.
But oc get co -o yaml does not show any operators with Upgradeable as False, all are still True.
oauth-openshift route is one which customers like to customize. We should fix to make Upgradeable as False when it has invalid cert of no-SAN. I guess other scenarios https://docs.openshift.com/container-platform/4.9/web_console/customizing-the-web-console.html#customizing-the-console-route_customizing-web-console and https://docs.openshift.com/container-platform/4.9/security/certificates/replacing-default-ingress-certificate.html have same issue but not yet tried.

Actual results:
3. No operator shows Upgradeable as False

Expected results:
3. There should be operators marked with Upgradeable as False to remind user before upgrade to 4.10.

Impact of the problem:
See https://bugzilla.redhat.com/show_bug.cgi?id=2037274#c0 for background

Additional info:

Comment 1 Miciah Dashiel Butler Masters 2022-02-09 23:38:08 UTC
I'm setting blocker- as this issue doesn't need to block the next z-stream release, but we may need to fix it in some 4.9.z release before 4.10.0 GA.  

I don't understand why the SANless certificate works with OpenShift 4.9; neither cluster-authentication-operator nor oauth-server is setting the GODEBUG environment variable as far I can tell using git-grep or ripgrep on their respective source repositories.  Can you confirm that the same certificate works with OpenShift 4.9 and fails with OpenShift 4.10?

Comment 2 Xingxing Xia 2022-02-10 04:08:35 UTC
(In reply to Miciah Dashiel Butler Masters from comment #1)
> but we may need to fix it in some 4.9.z release before 4.10.0 GA.

Agree

> Can you confirm that the same certificate works with OpenShift 4.9 and fails with OpenShift 4.10?

Yesterday 4.9 test showed all COs are good. Today tested 4.10 with same steps, got bad COs:
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.nightly-2022-02-09-225148   False       False         True       10m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://auth-openshift-custom.qe1.HIDDEN/healthz": x509: certificate
relies on legacy Common Name field, use SANs instead
...
console                                    4.10.0-0.nightly-2022-02-09-225148   True        True          False      36m     SyncLoopRefreshProgressing: Working toward version 4.10.0-0.nightly-2022-02-09-225148, 1 replicas available
...

$ oc get po -n openshift-console
NAME                         READY   STATUS    RESTARTS      AGE
console-5886c6845d-xzbtp     1/1     Running   0             38m
console-6fc6b8884f-k5hsp     0/1     Running   3 (59s ago)   11m
console-6fc6b8884f-vvh4b     0/1     Running   3 (42s ago)   11m
...
$ oc logs -n openshift-console console-6fc6b8884f-k5hsp
... repeated same log lines ...
E0210 03:56:13.518829       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://auth-openshift-custom.qe1.HIDDEN/oauth/token failed: Head "https://auth-openshift-custom.qe1.HIDDEN": x509: certificate relies on legacy Common Name field, use SANs instead

So, it fails with 4.10 as expected, this further proves this 4.9 bug needs be fixed.

Comment 3 Xingxing Xia 2022-02-10 09:43:28 UTC
One more thing, the tested 4.10 env shows below as well:
$ oc get ingress.config cluster -o yaml
...
status:
  componentRoutes:
  - conditions:
    - lastTransitionTime: "2022-02-10T03:44:57Z"
      message: 'unexpected error at auth-openshift-custom.HIDDEN:
        Get "https://auth-openshift-custom.qe1.HIDDEN/healthz":
        x509: certificate relies on legacy Common Name field, use SANs instead'
      reason: ErrorReachingOutToService
      status: "True"
      type: Progressing

Comment 9 Xingxing Xia 2022-06-29 12:05:18 UTC
I'm working on verification.

Comment 10 Xingxing Xia 2022-06-29 14:05:32 UTC
Verified in 4.9.0-0.nightly-2022-06-28-211928 with original steps:
After applying non-SAN cert, user is reminded by:
$ oc get co authentication
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication   4.9.0-0.nightly-2022-06-28-211928   True        False         True       11h     CustomRouteControllerDegraded: custom route configuration failed verification: [error validating secret openshift-config/custom-auth-component: [certificate relies on legacy Common Name field, use SANs instead:...

$ oc describe co authentication
...
Status:
  Conditions:
    Last Transition Time:        2022-06-29T13:53:19Z
    Message:                     CustomRouteControllerDegraded: custom route configuration failed verification: [error validating secret openshift-config/custom-auth-component: [certificate relies on legacy Common Name field, use SANs instead:
CustomRouteControllerDegraded:   sn=17889069629480321911;
CustomRouteControllerDegraded:   iss=CN=xxia_test_ca]]
OAuthClientsControllerDegraded: no ingress for host auth-openshift-custom.qe1.SNIPPED.com in route oauth-openshift in namespace openshift-authentication
    Reason:                CustomRouteController_SyncError::OAuthClientsController_SyncError
    Status:                True
    Type:                  Degraded

    ...
 
    Last Transition Time:  2022-06-29T02:36:34Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
...

From the message above, moving to VERIFIED. Revert oc edit ingress.config cluster setting, oc get co authentication is back to normal.
But from PR comment "prevent the upgrade", "Upgradeable" condition isn't "False", is this expected, Pierre?

Comment 13 errata-xmlrpc 2022-07-05 22:04:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.41 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5434

Comment 14 Pierre Prinetti 2022-08-04 13:02:46 UTC
> But from PR comment "prevent the upgrade", "Upgradeable" condition isn't "False", is this expected, Pierre?

Newly added certificates are validated by the respective operators and are outside the scope of this change IIUC. This very change should only catch already added certificates (that were added in a previous OCP version, where they were validated OK) that are detected as invalid upon upgrade, and set "NoUpgrade" in that case.


Note You need to log in before you can comment on or make changes to this bug.