Bug 1810036
Summary: | "You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert" after upgrade | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||
Component: | service-ca | Assignee: | Maru Newby <mnewby> | ||||
Status: | CLOSED ERRATA | QA Contact: | scheng | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.4 | CC: | agawand, aos-bugs, ChetRHosey, dahernan, dmoessne, dpunia, hfukumot, mfojtik, mharri, mnewby, nbhatt, obockows, pamoedom, rheinzma, rhowe, scheng, sdodson, slaznick, sreber, wking | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 4.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1810418 (view as bug list) | Environment: | |||||
Last Closed: | 2020-07-13 17:17:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1810418 | ||||||
Attachments: |
|
Description
Junqi Zhao
2020-03-04 13:03:37 UTC
Created attachment 1667540 [details]
CA cert chain
Confirmed, this issue has nothing to do with FIPS, it is a regression bug after upgrade Bumping to urgent. This needs to be backported as far back as 4.2 ASAP to avoid potentially impacting anyone upgrading to a rotation-supporting zstream release. On the bright side, this problem doesn't appear in golang code so it's only non-golang code that is likely to be impacted. Copying the assessment from the downstream 4.3 bug 1810420 since this is the bug that's referenced in the blocked edges. Who is impacted? All customers upgrading from to 4.3.5 that run workloads which use service-ca for certs. All customers that install fresh 4.3.5 cluster will be affected on rotation after 13 months if they don't upgrade. What is the impact? All workloads that use non-golang SSL network clients which use service-ca to communicate with platform or between each-other (eg. curl). How involved is remediation? Manual rotation of service-ca will fix the cluster for the next rotation (13 months). Is this a regression? Yes, this was introduced as part of the automated service-ca rotation and released on March 10 in 4.3.5. > Can you please add the step for manual workaround. It would be useful for CEE https://docs.openshift.com/container-platform/4.3/authentication/certificates/service-serving-certificate.html#manually-rotate-service-ca_service-serving-certificate Is this only an issue if you have FIPs enabled Can FIPS only be enabled if you install with 4.3 and enable it at install time? So if you have fips enabled the workaround is to rotate manually following these steps?: https://docs.openshift.com/container-platform/4.3/authentication/certificates/service-serving-certificate.html#manually-rotate-service-ca_service-serving-certificate Do we really want to point out a command that will delete every pod in the cluster... (well try, it will fail likely causing the cluster to fall over for a time being) (In reply to Ryan Howe from comment #9) > Is this only an issue if you have FIPs enabled > Can FIPS only be enabled if you install with 4.3 and enable it at install > time? This issue is not limited to a cluster with FIPS enabled. It affects any cluster that is upgraded to a release that enables automated service CA rotation without also ensuring unique CA serial numbers. > So if you have fips enabled the workaround is to rotate manually following > these steps?: > > https://docs.openshift.com/container-platform/4.3/authentication/ > certificates/service-serving-certificate.html#manually-rotate-service- > ca_service-serving-certificate > > Do we really want to point out a command that will delete every pod in the > cluster... (well try, it will fail likely causing the cluster to fall over > for a time being) The provided link is the documented procedure for manual CA rotation. Manual deletion of all pods will disrupt all services in the cluster - including the control plane - but the cluster will recover. It's similar to the node drain that occurs on every upgrade. Expanding on the "Who is impacted?" from comment 8, so we know which update recommendations to pull and which releases to tombstone: The bugs were introduced by the bug 1774121 series, and fixed by the combination of this series and bug 1801573. Quick overview: * 4.4: both rc.0 and rc.1 affected, so updates into rc.0 and tombstone rc.1 are impacted (and running either RC for 13+ months will also hit a broken CA rotation). Fixes have landed, so next 4.4 RC should be clean. * 4.3: 4.3.5 introduced the breakage, so updated into 4.3.5 are impacted. No fix yet. * 4.2: 4.2.22 introduced the breakage, so updates into 4.2.22, 4.2.23, and 4.2.24 are impacted. No fix yet. * 4.1: not impacted yet. Bug 1774157 was backporting the breaking change, and is still ASSIGNED. Reasoning behind the overview's claims: * 4.5: Introduced by bug 1774121 (no linked PR, so not sure exactly when it was introduced). Fixed by bug 1810036, service-ca-operator 74b5ce2 [1], which included library-go d9c73bb [2]. Also fixed by bug 1801573, oauth-proxy 3d0621e [3], which landed before the 4.4/4.5 split. * 4.4: Introduced by bug 1774121 (no linked PR, so not sure exactly when it was introduced). Fixed by bug 1810418, service-ca-operator e5a04d6 [4], which included library-go 3c25293 [5]. $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.4.0-rc.0-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 094a9ad02dbe3bcb57d5fbad301cfcfcd48bd2ed $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.4.0-rc.1-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 094a9ad02dbe3bcb57d5fbad301cfcfcd48bd2ed $ git --no-pager log -2 --first-parent --oneline origin/release-4.4 e5a04d6a (origin/release-4.4) Merge pull request #111 from marun/4.4-unique-ca-serial 094a9ad0 Merge pull request #95 from vareti/signer-ca-metrics So both RCs are affected. Also fixed by bug 1801573, oauth-proxy 3d0621e [3], which landed before the 4.4/4.5 split. $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.4.0-rc.0-x86_64 | grep oauth-proxy oauth-proxy https://github.com/openshift/oauth-proxy 3d0621eb72c9dd1c036505363032468a9016f381 $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.4.0-rc.1-x86_64 | grep oauth-proxy oauth-proxy https://github.com/openshift/oauth-proxy 3d0621eb72c9dd1c036505363032468a9016f381 So both RCs have OAuth fix, but neither has the service-ca-operator fix. * 4.3: Introduced by bug 1788179, service-ca-operator 8395d65 [6]. Fixed by bug 1810420, service-ca-operator dd7235b [7], which includes library-go 5844159 [8]. Fix has not been released yet. $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.3-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 774c394da334dec446703545d4baaf89611ccb9d $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 8395d65888b0a4249277989f18ee03f45383e409 So this was introduced in 4.3.5 (there was no 4.3.4). Fix also requires the OAuth proxy fix in bug 1809253 and [9], which is still in flight. * 4.2: Introduced by bug 1774156, service-ca-operator 0324055 [10], which includes library-go 2cf86bb [11] and API 8ce0047 [12]. Fix in flight with bug 1810421 and [13]. [14] has already landed with library-go d58edcb. $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.21-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator f6720573b9b63147436374e51e6fda44683b1e9f $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.22-x86_64 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 0324055c3bad3a857dcf3471c024bf42c20d549e So this was introduced in 4.2.22. Fix also requires the OAuth proxy fix from bug 1809258 and [15], which is still in flight. * 4.1: Backport stream introducing the bug 1774157 is still ASSIGNED, so no 4.1 impact yet. [1]: https://github.com/openshift/service-ca-operator/pull/110#event-3111531432 [2]: https://github.com/openshift/library-go/pull/726#event-3106684443 [3]: https://github.com/openshift/oauth-proxy/pull/152#event-3029892031 [4]: https://github.com/openshift/service-ca-operator/pull/111#event-3132963585 [5]: https://github.com/openshift/library-go/pull/728#event-3129427368 [6]: https://github.com/openshift/service-ca-operator/pull/104#event-3053794085 [7]: https://github.com/openshift/service-ca-operator/pull/112#event-3142240318 [8]: https://github.com/openshift/library-go/pull/729#event-3139571599 [9]: https://github.com/openshift/oauth-proxy/pull/160 [10]: https://github.com/openshift/service-ca-operator/pull/105#event-3076020193 [11]: https://github.com/openshift/library-go/pull/684#event-3059339775 [12]: https://github.com/openshift/api/pull/577#event-3061441773 [13]: https://github.com/openshift/service-ca-operator/pull/113 [14]: https://github.com/openshift/library-go/pull/730#event-3141931034 [15]: https://github.com/openshift/oauth-proxy/pull/164 Given that 4.5 is not yet released, and release versions already have a fix, I'm assuming no docs are required. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |