Bug 1810420

Summary: [4.3] "You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert" on FIPS enabled cluster after upgrade
Product: OpenShift Container Platform Reporter: Maru Newby <mnewby>
Component: service-caAssignee: Maru Newby <mnewby>
Status: CLOSED ERRATA QA Contact: scheng
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.3.0CC: aos-bugs, juzhao, liyao, lmohanty, mfojtik, mnewby, slaznick, wking, wsun
Target Milestone: ---Keywords: Regression, Upgrades
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1810418
: 1810421 (view as bug list) Environment:
Last Closed: 2020-03-24 14:34:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1810418    
Bug Blocks: 1810421    

Comment 1 Lalatendu Mohanty 2020-03-18 14:31:13 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges.

Who is impacted?
  Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the time
What is the impact?
  Up to 2 minute disruption in edge routing
  Up to 90seconds of API downtime
  etcd loses quorum and you have to restore from backup
How involved is remediation?
  Issue resolves itself after five minutes
  Admin uses oc to fix things
  Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression?
  No, it’s always been like this we just never noticed
  Yes, from 4.2.z and 4.3.1

Depending on the answers to the above questions we can remove UpgradeBlocker keyword.

Comment 2 Michal Fojtik 2020-03-18 15:33:05 UTC
Who is impacted?
  All customers upgrading from to 4.3.5 that run workloads which use service-ca for certs.
  All customers that install fresh 4.3.5 cluster will be affected on rotation after 13 months if they don't upgrade.

What is the impact?
  All workloads that use non-golang SSL network clients which use service-ca to communicate with platform or between each-other (eg. curl). 

How involved is remediation?
  Manual rotation of service-ca will fix the cluster for the next rotation (13 months).

Is this a regression?
  Yes, this was introduced as part of the automated service-ca rotation and released on March 10 in 4.3.5.

Comment 3 Michal Fojtik 2020-03-18 15:34:48 UTC
>  All customers upgrading from to 4.3.5 that run workloads which use service-ca for certs.

s/from//

Comment 4 Lalatendu Mohanty 2020-03-18 15:35:27 UTC
(In reply to Michal Fojtik from comment #2)

> How involved is remediation?
>   Manual rotation of service-ca will fix the cluster for the next rotation
> (13 months).

Can you please add the step for manual workaround. It would be useful for CEE

Comment 5 Michal Fojtik 2020-03-18 15:38:26 UTC
> Can you please add the step for manual workaround. It would be useful for CEE


https://docs.openshift.com/container-platform/4.3/authentication/certificates/service-serving-certificate.html#manually-rotate-service-ca_service-serving-certificate

Comment 9 errata-xmlrpc 2020-03-24 14:34:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0858

Comment 10 W. Trevor King 2021-04-05 17:46:35 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475