Bug 1889204 - api server certs are not updated to include a SAN when upgrading from 4.5
Summary: api server certs are not updated to include a SAN when upgrading from 4.5
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Over the Air Updates
QA Contact: Johnny Liu
URL:
Whiteboard:
: 1895297 (view as bug list)
Depends On:
Blocks: 1997337
TreeView+ depends on / blocked
 
Reported: 2020-10-19 02:18 UTC by zhou ying
Modified: 2023-01-11 18:22 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1997337 (view as bug list)
Environment:
Last Closed: 2023-01-11 18:22:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
run of oc mirroring a repository (63.17 KB, text/plain)
2021-02-02 23:08 UTC, John Hixson
no flags Details

Description zhou ying 2020-10-19 02:18:16 UTC
Description of problem:
For oc commands that require TLS verification, if the certificates do not set a Subject Alternative Name, verification does not fall back to the Common Name field and the command fails with the following error:
x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
As a workaround, you can either use a certificate with a proper Subject Alternative Name set, or precede the oc command with GODEBUG=x509ignoreCN=0 to temporarily override this behavior

Comment 1 Maciej Szulik 2020-10-19 10:37:41 UTC
Sally, I'm sending this your way, since you were involved in tracking this.

Comment 2 Sally 2020-10-23 16:33:37 UTC
Workloads team is discussing what actions need to be taken from client-side, if any.  We'll update here during this sprint.

Comment 3 Sally 2020-11-12 17:36:46 UTC
Adding UpcomingSprint to keep tracking this. We are waiting for oc to build with updated golang that will enable the following:

with a new env var:
x509warnCN=1

Which when set, in conjunction with
x509ignoreCN=0

Will give users both functionality and warning that it will break in the future. Without this new env var, when the golang var is set (x509ignoreCN=0) users may not be aware that they are depending upon a certificate without SAN.  The extra warning we will provide will motivate users to update their certificates as golang could in future releases remove the ability to set x509ignoreCN=0.

Comment 4 Sally 2020-11-12 17:51:18 UTC
*** Bug 1895297 has been marked as a duplicate of this bug. ***

Comment 5 Feras Al Taher 2020-11-17 08:14:07 UTC
This is causing production issue to my customer as they can't scaleup the cluster due to this bug, the ignition while trying to communicate with api-int is getting this error. from my understanding we shouldn't update the api-int certificate. and this is not an oc command that I can add GODEBUG=x509ignoreCN=0, so what are my options

Comment 6 Sally 2020-12-05 00:28:32 UTC
Looking into this this sprint, specifically this: "ignition while trying to communicate with api-int is getting this error" might move this issue along to the installer group? Actively working on this, will report back.

Comment 7 Maciej Szulik 2020-12-07 12:17:23 UTC
#comment 5 indicates the problem in the installer code, not with oc, moving accordingly.

Comment 9 John Hixson 2021-02-02 00:59:06 UTC
Can you provide some examples of how one can reproduce this?

Comment 11 yaoli 2021-02-02 01:07:55 UTC
the problem usually happens on upgrade customer from the old ocp4 version, I think

Comment 12 John Hixson 2021-02-02 23:08:09 UTC
Created attachment 1754528 [details]
run of oc mirroring a repository

I am unable to reproduce the problem based on the provided instructions. oc version 4.6.15 was used for this test.

Comment 13 Brenton Leanhardt 2021-02-04 18:39:59 UTC
We're going to close this for now.  It doesn't seem like we have the full steps to reproduce.  Feel free to reopen if you can provide the steps or have more logs to share.

Comment 21 Stefan Schimanski 2021-04-23 12:22:32 UTC
Moving back to installer. These certs are created by the installer. If installer allows(ed) customers to have certs that are broken after an upgrade, it is installer's responsibility to guide the customer through steps to fix the situation. As nobody owns or controls the LBs after installation, a doc solution is probably the only way to solve this.

Comment 28 Russell Teague 2021-08-02 19:20:31 UTC
Architecture discussions are in progress to determine how to handle these resources.

Comment 30 Russell Teague 2021-08-24 17:26:00 UTC
Given that this is related to upgrade issues, moving this to the OTA team.

We don't think this would be solved in code with the installer.  Since the load balancers are not managed by the cluster, the certs would be a day-2 operation, possibly handled with a doc update.

Comment 33 Jack Ottofaro 2022-04-13 17:26:43 UTC
Will review to get an updated problem space to figure out what needs to be done to address.

Comment 40 aygarg 2022-10-05 00:39:16 UTC
Hello Team,

Any updates on this?

Comment 43 Lalatendu Mohanty 2023-01-11 18:22:30 UTC
As per 4.6 release notes [1] "The behavior of falling back to the Common Name field on X.509 certificates as a host name when no Subject Alternative Names (SANs) are present is removed. Certificates must properly set the Subject Alternative Names field. " 

It seems all users have fixed the the certificate and the issues have been resolved. Also we do not hear about the issue in recent releases i.e. 4.8, 4.9, 4.10, 4.11 etc.  So closing this bug as not a bug. 

Please re-open or file a new bug if you disagree.

[1] https://access.redhat.com/documentation/en-us/openshift_container_platform/4.6/html/release_notes/ocp-4-6-release-notes#ocp-4-6-tls-common-name


Note You need to log in before you can comment on or make changes to this bug.