Bug 1989335

Summary: Etcd is degraded after upgrading to 4.9 with message "configmap openshift-config-managed/csr-controller-ca field manager is not valid"
Product: OpenShift Container Platform Reporter: Yang Yang <yanyang>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:43:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yang Yang 2021-08-03 02:10:38 UTC
Description of problem:

Failed to upgrade a UPI baremetal cluster from 4.1.41-x86_64--> 4.2.36-x86_64,4.3.40-x86_64,4.4.33-x86_64,4.5.41-x86_64,4.6.41-x86_64,4.7.22-x86_64,4.8.3-x86_64,4.9.0-0.nightly-2021-08-01-102437. Etcd is degraded with message TargetConfigControllerDegraded: configmap openshift-config-managed/csr-controller-ca field manager is not valid

profile_name=05_UPI on Baremetal with RHCOS (FIPS off)

The configmap openshift-config-managed/csr-controller-ca is as below:

- apiVersion: v1
  data:
    ca-bundle.crt: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
  kind: ConfigMap
  metadata:
    creationTimestamp: "2021-08-01T18:37:49Z"
    name: csr-controller-ca
    namespace: openshift-config-managed
    resourceVersion: "3018"
    uid: 92d90598-f2f7-11eb-8867-fa163e1d249f

$ oc describe co/etcd

08-02 15:32:34.612  Name:         etcd
08-02 15:32:34.612  Namespace:    
08-02 15:32:34.612  Labels:       <none>
08-02 15:32:34.612  Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
08-02 15:32:34.612  API Version:  config.openshift.io/v1
08-02 15:32:34.612  Kind:         ClusterOperator
08-02 15:32:34.612  Metadata:
08-02 15:32:34.612    Creation Timestamp:  2021-08-01T21:04:35Z
08-02 15:32:34.612    Generation:          1
08-02 15:32:34.612    Resource Version:    539061
08-02 15:32:34.612    UID:                 7d18c429-3095-42b3-81ea-b60358696727
08-02 15:32:34.612  Spec:
08-02 15:32:34.612  Status:
08-02 15:32:34.612    Conditions:
08-02 15:32:34.612      Last Transition Time:  2021-08-02T04:31:51Z
08-02 15:32:34.612      Message:               TargetConfigControllerDegraded: configmap openshift-config-managed/csr-controller-ca field manager is not valid
08-02 15:32:34.612      Reason:                TargetConfigController_SynchronizationError
08-02 15:32:34.612      Status:                True
08-02 15:32:34.612      Type:                  Degraded
08-02 15:32:34.612      Last Transition Time:  2021-08-02T04:36:15Z
08-02 15:32:34.612      Message:               NodeInstallerProgressing: 3 nodes are at revision 8
08-02 15:32:34.612  EtcdMembersProgressing: No unstarted etcd members found
08-02 15:32:34.613      Reason:                AsExpected
08-02 15:32:34.613      Status:                False
08-02 15:32:34.613      Type:                  Progressing
08-02 15:32:34.613      Last Transition Time:  2021-08-01T21:07:08Z
08-02 15:32:34.613      Message:               StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 8
08-02 15:32:34.613  EtcdMembersAvailable: 3 members are available
08-02 15:32:34.613      Reason:                AsExpected
08-02 15:32:34.613      Status:                True
08-02 15:32:34.613      Type:                  Available
08-02 15:32:34.613      Last Transition Time:  2021-08-01T21:04:53Z
08-02 15:32:34.613      Message:               All is well
08-02 15:32:34.613      Reason:                AsExpected
08-02 15:32:34.613      Status:                True
08-02 15:32:34.613      Type:                  Upgradeable
08-02 15:32:34.613    Extension:               <nil>
08-02 15:32:34.613    Related Objects:
08-02 15:32:34.613      Group:     operator.openshift.io
08-02 15:32:34.613      Name:      cluster
08-02 15:32:34.613      Resource:  etcds
08-02 15:32:34.613      Group:     
08-02 15:32:34.613      Name:      openshift-config
08-02 15:32:34.613      Resource:  namespaces
08-02 15:32:34.613      Group:     
08-02 15:32:34.613      Name:      openshift-config-managed
08-02 15:32:34.613      Resource:  namespaces
08-02 15:32:34.613      Group:     
08-02 15:32:34.613      Name:      openshift-etcd-operator
08-02 15:32:34.613      Resource:  namespaces
08-02 15:32:34.613      Group:     
08-02 15:32:34.613      Name:      openshift-etcd
08-02 15:32:34.613      Resource:  namespaces
08-02 15:32:34.613    Versions:
08-02 15:32:34.613      Name:     operator
08-02 15:32:34.613      Version:  4.9.0-0.nightly-2021-08-01-102437
08-02 15:32:34.613      Name:     raw-internal
08-02 15:32:34.613      Version:  4.9.0-0.nightly-2021-08-01-102437
08-02 15:32:34.613      Name:     etcd
08-02 15:32:34.613      Version:  4.9.0-0.nightly-2021-08-01-102437
08-02 15:32:34.613  Events:       <none>

Version-Release number of selected component (if applicable):

4.9.0-0.nightly-2021-08-01-102437

How reproducible:
3/3

Steps to Reproduce:
1. Install a cluster with profile 05_UPI on Baremetal with RHCOS (FIPS off) with 4.1.41-x86_64
2. Upgrade the cluster up to 4.9.0-0.nightly-2021-08-01-102437
3.

Actual results:
Upgrade failed. Etcd is degraded

Expected results:
Upgrade is successful

Additional info:

Upgrade ci job link: https://mastern-jenkins-csb-openshift-qe.apps.ocp4.prod.psi.redhat.com/job/upgrade_CI/16362/consoleFull

Comment 3 ge liu 2021-08-27 03:56:46 UTC
Verified with upgrade 4.1.41-x86_64--> 4.2.36-x86_64,4.3.40-x86_64,4.4.33-x86_64,4.5.41-x86_64,4.6.43-x86_64,4.7.26-x86_64,4.8.7-x86_64,4.9.0-0.nightly-2021-08-26-013855, even if it failed at last with bug: https://bugzilla.redhat.com/show_bug.cgi?id=1994857,
but etcd is upgraded successfully.

Comment 6 errata-xmlrpc 2021-10-18 17:43:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759