Bug 2008119 - The serviceAccountIssuer field on Authentication CR is reseted to “” when installation process
Summary: The serviceAccountIssuer field on Authentication CR is reseted to “” when ins...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.0
Assignee: Matthew Staebler
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 2009342
TreeView+ depends on / blocked
 
Reported: 2021-09-27 10:34 UTC by wang lin
Modified: 2022-03-12 04:38 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: During bootstrapping, there are two components that are both trying to write manifests to the k8s API server. The first is the cluster-bootstrap, which is trying to write manifests supplied by the installer. The second is the cluster-version-operator, which is trying to write manifests from the release image. If there is a manifest supplied by the installer for a resource that also has a manifest in the release image, then there is a race between which manifest will actually be written. Consequence: If the cluster-bootstrap loses the race to create the Authentication resource, the customizations added by the user are lost. Fix: Explicitly block the cluster-version-operator from creating the resources that are created by installer manifests. All resources from installer manifests are temporarily added to the ClusterVersion resource as resource to ignore. After bootstrapping, those resources are removed from the ClusterVersion resource ignore list. Result: Successful installations with user customizations to the Authentication resource retained.
Clone Of:
: 2009342 (view as bug list)
Environment:
Last Closed: 2022-03-12 04:38:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5258 0 None open Bug 2008119: force cvo to ignore installer-provided resources 2021-09-30 04:23:39 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:38:55 UTC

Comment 1 Standa Laznicka 2021-09-27 12:07:40 UTC
I am not aware of any changes to handling this field. Moving to installer. If you want to move it back to our component, please show us proof that our operator resets the field by providing the audit logs containing the event with the change.

Comment 2 Matthew Staebler 2021-09-27 14:10:48 UTC
Please attach the manifests that you are adding. Also, please attach the full install directory, including the log file and the state file.

Comment 3 Matthew Staebler 2021-09-27 14:28:57 UTC
Note that the install of the cluster was not successful. The openshift-apiserver is reporting APIServicesAvailable with 503 errors. I do not know enough about the authentication type to know if the APIServicesAvailable error is a result of the misconfigured authentication or vice versa.

Comment 6 wang lin 2021-09-28 02:18:29 UTC
because this issue block all sts cluster installation, so adding TestBlocker keywords.

Comment 7 Mike Fiedler 2021-09-28 13:07:59 UTC
This is also believed to be a Regression

Comment 8 Scott Dodson 2021-09-28 13:14:33 UTC
If this is a regression it should also be marked as a blocker. Is there a reason you don't believe that to be the case?

Comment 9 Matthew Staebler 2021-09-28 17:28:03 UTC
(In reply to Scott Dodson from comment #8)
> If this is a regression it should also be marked as a blocker. Is there a
> reason you don't believe that to be the case?

This seems like a blocker to me.

Comment 10 Matthew Staebler 2021-09-28 17:29:44 UTC
Some other component is creating the authentication resource prior to the CVO laying down the manifest supplied to the installer.

Here are logs from the bootstrap of an install that I ran.
```
$ journalctl -u bootkube.service | grep authentication
Sep 28 17:20:56 ip-10-0-4-239 bootkube.sh[2233]: Writing asset: /assets/config-bootstrap/manifests/0000_10_config-operator_01_authentication.crd.yaml
Sep 28 17:21:51 ip-10-0-4-239 bootkube.sh[2233]: Created "0000_10_config-operator_01_authentication.crd.yaml" customresourcedefinitions.v1.apiextensions.k8s.io/authentications.config.openshift.io -n
Sep 28 17:22:32 ip-10-0-4-239 bootkube.sh[2233]: "cluster-authentication-02-config.yaml": unable to get REST mapping for "cluster-authentication-02-config.yaml": no matches for kind "Authentication" in version "config.openshift.io/v1"
Sep 28 17:22:39 ip-10-0-4-239 bootkube.sh[2233]: Skipped "cluster-authentication-02-config.yaml" authentications.v1.config.openshift.io/cluster -n  as it already exists
```

Comment 11 Matthew Staebler 2021-09-28 17:40:19 UTC
The authentication CR is included as a manifest by the cluster-config-operator.

https://github.com/openshift/cluster-config-operator/blob/master/empty-resources/0000_05_config-operator_02_authentication.cr.yaml

Comment 12 Matthew Staebler 2021-09-28 18:34:17 UTC
I am moving this bug to kube-apiserver, as they own cluster-bootstrap. The cluster-bootstrap is failing to write the authentication resource if the cluster-version-operator writes the resource from cluster-config-operator first.

Comment 13 Stefan Schimanski 2021-09-29 07:38:05 UTC
This is working as designed. The cluster-config-operator creates the CRs as "create-only". It also lays down the CRDs. So there is a race between installer's use of cluster-bootstrap and cluster-config-operator. There is no mechanism in place, and it was never a requirement, that the config CR creation is replaced "by the installer". In other words, this is not a bug, but an RFE if we want to fix this.

Comment 17 Mike Fiedler 2021-10-01 14:49:54 UTC
Looks like the fix is not in 4.10.0-0.nightly-2021-10-01-013103.   The problem still occurs there and https://amd64.ocp.releases.ci.openshift.org/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-10-01-013103?from=4.10.0-0.nightly-2021-09-30-041351 does not list the fix in the Installer section.   Moving to MODIFIED

Comment 18 Mike Fiedler 2021-10-01 14:54:36 UTC
Looks like fix will be in 4.10.0-0.nightly-2021-10-01-141332.   Will test there.  Back to ON_QA

Comment 19 Mike Fiedler 2021-10-01 15:52:49 UTC
Verified on 4.10.0-0.nightly-2021-10-01-141332 - AWS STS install successful.

{
  "oauthMetadata": {
    "name": ""
  },
  "serviceAccountIssuer": "https://xxxxxxxx-oidc.s3.us-east-2.amazonaws.com",
  "type": "",
  "webhookTokenAuthenticator": {
    "kubeConfig": {
      "name": "webhook-authentication-integrated-oauth"
    }
  }
}

Comment 23 errata-xmlrpc 2022-03-12 04:38:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.