Bug 1907812 - 4.7 to 4.6 downgrade stuck in clusteroperator storage
Summary: 4.7 to 4.6 downgrade stuck in clusteroperator storage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.7.0
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On: 1916586
Blocks: 1912720
TreeView+ depends on / blocked
 
Reported: 2020-12-15 09:58 UTC by Xingxing Xia
Modified: 2021-02-24 15:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1912720 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:43:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 118 0 None closed Bug 1907812: Use separate RBAC objects for AWS CA bundle retrieval 2021-02-15 10:24:12 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:44:37 UTC

Description Xingxing Xia 2020-12-15 09:58:29 UTC
Description of problem:
4.6.8 successfully upgrades to latest 4.7.0-0.nightly-2020-12-15-042043. Then downgrade to 4.6, stuck in:
“Unable to apply 4.6.8: the cluster operator storage is degraded”

Adding TestBlocker because blocking the test of epic issue MSTR-1055.

Version-Release number of selected component (if applicable):
4.6.8 upgrade to 4.7.0-0.nightly-2020-12-15-042043, then downgrade back to 4.6.8

How reproducible:
Tried once so far

Steps to Reproduce:
1. Successfully install 4.6.8 IPI AWS env
2. Successfully upgrade to 4.7.0-0.nightly-2020-12-15-042043
3. Then downgrade to 4.6.8

Actual results:
Step 3 fails with clusteroperator storage stuck:
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-15-042043   True        True          116m    Unable to apply 4.6.8: the cluster operator storage is degraded

$ oc describe co storage
Name:         storage
...
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-12-15T07:04:10Z
    Message:               AWSEBSCSIDriverOperatorCRDegraded: ResourceSyncControllerDegraded: configmaps "kube-cloud-config" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator" cannot get resource "configmaps" in API group "" in the namespace "openshift-config-managed"
    Reason:                AWSEBSCSIDriverOperatorCR_ResourceSyncController_Error
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-12-15T07:04:47Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-12-15T06:05:22Z
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-12-15T02:24:45Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:
    Name:      openshift-cluster-storage-operator
    Resource:  namespaces
    Group:
    Name:      openshift-cluster-csi-drivers
    Resource:  namespaces
    Group:
    Name:      openshift-manila-csi-driver
    Resource:  namespaces
    Group:     operator.openshift.io
    Name:      cluster
    Resource:  storages
    Group:     operator.openshift.io
    Name:      ebs.csi.aws.com
    Resource:  clustercsidrivers
    Group:     operator.openshift.io
    Name:      csi.ovirt.org
    Resource:  clustercsidrivers
    Group:     operator.openshift.io
    Name:      manila.csi.openstack.org
    Resource:  clustercsidrivers
  Versions:
    Name:     operator
    Version:  4.6.8
    Name:     AWSEBSCSIDriverOperator
    Version:  4.6.8
Events:       <none>


Expected results:
Should downgrade successfully

Additional info:
In the past, 4.6 to 4.5 downgrade bugs were found in other clusteroperators: bug 1868376, bug 1885848, bug 1877316, and they were fixed. 4.7 to 4.6 downgrade should succeed too.

Comment 1 Jan Safranek 2020-12-15 17:20:57 UTC
Xingxing, please attach must-gather next time, it will speed up investigation a lot!

Working theory:

1. In 4.7, we introduced syncing of AWS CA bundle into the driver namespace, so the driver can talk to AWS API. https://github.com/openshift/aws-ebs-csi-driver-operator/pull/102

2. When downgrading to 4.6, 4.6 RBAC is applied first.

3. 4.7 AWS operator is still running and tries to sync the CA bundle, but it already misses RBAC to do so. Sync fails and corresponding condition is raised:

      message: 'configmaps "kube-cloud-config" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator"
        cannot get resource "configmaps" in API group "" in the namespace "openshift-config-managed"'
      reason: Error
      status: "True"
      type: ResourceSyncControllerDegraded

4. CSO Deployment is downgraded to 4.6, which downgrades AWS operator to 4.6 version.

5. 4.6 AWS operator runs just great, but it does not sync CA bundle, neither it runs any other syncer and *nothing clears ResourceSyncControllerDegraded condition*
-> operator is degraded forever.

Workaround: oc delete clustercsidriver --all
CSO will re-create it and everything should re-sync.

Brainstorming some solutions:
I. The operator somehow clears all conditions it does not manage. But how does it know?
II. Deploy 4.7 RBAC to sync the CA bundle as a separate ClusterRole / ClusterRoleBinding. Downgrade won't remove it and 4.7 operator won't report Degraded. In other words, when adding anything to RBAC, always add it as a separate ClusterRole to prevent similar errors in the future.

Comment 2 Jan Safranek 2020-12-15 17:33:44 UTC
Xingxing, btw, there is more than storage degraded on the cluster:

NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.8                               False       True          True       38m
console                                    4.6.8                               True        False         True       41m
monitoring                                 4.6.8                               False       False         True       36m
network                                    4.7.0-0.nightly-2020-12-14-165231   True        True          True       104m

The cluster is so broken I can't get must-gather:

$ oc adm must-gather
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-ttb77 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-vx97s created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created
[must-gather-8hdps] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-vx97s deleted
[must-gather      ] OUT namespace/openshift-must-gather-ttb77 deleted
error: gather did not start for pod must-gather-8hdps: timed out waiting for the condition


I saw the gather pod scheduled, but kubelet did not act on it.

Comment 3 Xingxing Xia 2020-12-18 11:33:42 UTC
(In reply to Jan Safranek from comment #1)
> Workaround: oc delete clustercsidriver --all
> CSO will re-create it and everything should re-sync.

Tested with this workaround when it is stuck in co/storage, but get more failures:
oc get clusterversion -w
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-18-031435   True        True          46m     Working towards 4.6.8: 80% complete, waiting on network
version   4.7.0-0.nightly-2020-12-18-031435   True        True          52m     Unable to apply 4.6.8: the control plane is reporting an internal error
...
version   4.7.0-0.nightly-2020-12-18-031435   True        True          80m     Unable to apply 4.6.8: the update could not be applied
version   4.7.0-0.nightly-2020-12-18-031435   True        True          82m     Working towards 4.6.8: 1% complete

Projects terminating:
openshift-multus                                   Terminating   4h54m
openshift-network-diagnostics                      Terminating   139m
openshift-sdn                                      Terminating   4h54m

Pods are abnormal:
openshift-marketplace                              certified-operators-f5lg6                                                  0/1   ContainerCreating   0     41m    <none>         ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-marketplace                              community-operators-92sc4                                                  0/1   ContainerCreating   0     46m    <none>         ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-marketplace                              qe-app-registry-rngv2                                                      0/1   ContainerCreating   0     47m    <none>         ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-marketplace                              redhat-marketplace-2fnhz                                                   0/1   ContainerCreating   0     46m    <none>         ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-marketplace                              redhat-operators-mstxq                                                     0/1   ContainerCreating   0     41m    <none>         ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   multus-admission-controller-66xfm                                          0/2   Terminating         0     138m   10.130.0.3     ip-10-0-215-209.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   multus-admission-controller-bg4bk                                          0/2   Terminating         0     137m   10.129.0.8     ip-10-0-166-233.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   multus-admission-controller-s2g8n                                          0/2   Terminating         0     139m   10.128.0.9     ip-10-0-152-224.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-7xrt2                                               0/2   Terminating         0     138m   10.129.0.3     ip-10-0-166-233.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-mnkmf                                               0/2   Terminating         0     138m   10.130.0.4     ip-10-0-215-209.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-nfn7s                                               0/2   Terminating         0     139m   10.129.2.4     ip-10-0-147-168.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-pfc82                                               0/2   Terminating         0     137m   10.131.0.3     ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-qr22f                                               0/2   Terminating         0     139m   10.128.0.3     ip-10-0-152-224.ap-northeast-2.compute.internal   <none>   <none>
openshift-multus                                   network-metrics-daemon-trffj                                               0/2   Terminating         0     138m   10.128.2.2     ip-10-0-172-3.ap-northeast-2.compute.internal     <none>   <none>
openshift-network-diagnostics                      network-check-source-788d6c944d-z64hw                                      0/1   Terminating         1     107m   10.129.2.9     ip-10-0-147-168.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-g5dfq                                                 0/1   Terminating         0     140m   10.131.0.2     ip-10-0-222-167.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-jfq7s                                                 0/1   Terminating         0     140m   10.130.0.5     ip-10-0-215-209.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-pwvj9                                                 0/1   Terminating         0     140m   10.128.0.7     ip-10-0-152-224.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-tmmfq                                                 0/1   Terminating         0     140m   10.129.2.2     ip-10-0-147-168.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-txbpt                                                 0/1   Terminating         0     140m   10.129.0.6     ip-10-0-166-233.ap-northeast-2.compute.internal   <none>   <none>
openshift-network-diagnostics                      network-check-target-vkzvv                                                 0/1   Terminating         0     140m   10.128.2.3     ip-10-0-172-3.ap-northeast-2.compute.internal     <none>   <none>

Operators are abnormal:
Clusteroperators which are not 4.6.8 True False False:
authentication                             4.6.8                               False   True    True    48m
baremetal                                  4.7.0-0.nightly-2020-12-18-031435   True    False   False   148m
console                                    4.6.8                               True    False   True    60m
dns                                        4.7.0-0.nightly-2020-12-18-031435   True    False   False   4h50m
machine-config                             4.7.0-0.nightly-2020-12-18-031435   True    False   False   97m
monitoring                                 4.6.8                               False   False   True    46m
network                                    4.7.0-0.nightly-2020-12-18-031435   True    True    True    127m
openshift-apiserver                        4.6.8                               False   False   False   47m

Check co/openshift-apiserver for example (I didn't check other COs):
oc describe co openshift-apiserver
  Conditions:
    Last Transition Time:  2020-12-18T10:19:34Z
    Reason:                AsExpected
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2020-12-18T10:15:34Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-12-18T10:35:08Z
    Message:               APIServicesAvailable: "apps.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
APIServicesAvailable: "image.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
APIServicesAvailable: "project.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
    Reason:                APIServices_Error
    Status:                False
    Type:                  Available

Comment 5 Jan Safranek 2021-01-05 10:08:55 UTC
There must be two fixes:

* in 4.7: use a separate RBAC objects for kube-cloud-config config map, so it's not removed when downgrading to 4.6
* In 4.6.z: remove ResourceSyncControllerDegraded condition if it's set by the 4.7 version of the operator for any reason.

Comment 6 Jan Safranek 2021-01-06 12:22:53 UTC
Xingxing, I don't think the other failures are related to storage - nothing in the cluster actually uses PVs/PVCs. I would suggest to file separate issues for them. BTW, I got similar results when testing my 4.6.z PR (i.e. downgrading from today's 4.7 nightly to 4.6-ish release built from current 4.6.z branch + my test AWS EBS CSI driver patches).

NAME                                       VERSION                                           AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   False       True          True       45m
console                                    4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   True        False         True       50m
kube-apiserver                             4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   True        True          True       145m
monitoring                                 4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   False       False         True       41m
network                                    4.7.0-0.nightly-2021-01-06-055910                 True        True          True       147m
operator-lifecycle-manager-packageserver   4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   False       True          False      44m
openshift-apiserver                        4.6.0-0.ci.test-2021-01-06-104349-ci-ln-5p6sqvk   False       False         False      44m


In addition, you could even test downgrade e.g. on GCE - AWS CSI driver is not installed there.

Comment 7 Xingxing Xia 2021-01-07 03:55:44 UTC
Thanks, I should have thought of this idea. Let me try other clouds.

Comment 8 Xingxing Xia 2021-01-07 09:38:00 UTC
Indeed, tried downgrade in GCP env, hit and filed networking bug https://bugzilla.redhat.com/show_bug.cgi?id=1913620

Comment 9 Yang Yang 2021-01-11 04:27:36 UTC
I can also reproduce it when upgrading AWS cluster from 4.6.9 -> 4.7.0-fc.1 -> 4.6.9.

Comment 11 To Hung Sze 2021-01-13 14:24:02 UTC
I also reproduced similar issue going from 4.6.9 -> 4.7 fc2 -> 4.6.9 on GCP.
authentication                             4.6.9        False       True          True       8h
baremetal                                  4.7.0-fc.2   True        False         False      11h
console                                    4.6.9        True        False         True       8h
dns                                        4.7.0-fc.2   True        False         False      14h
image-registry                             4.6.9        True        True          False      13h
machine-config                             4.7.0-fc.2   True        False         False      11h
monitoring                                 4.6.9        False       False         True       8h
network                                    4.7.0-fc.2   True        True          True       11h
openshift-apiserver                        4.6.9        False       False         False      8h

As described above, cluster is so broken that must-gather fails:
$ ./oc adm must-gather
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-jnfzp created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-k27wx created
[must-gather      ] OUT pod for plug-in image registry.redhat.io/openshift4/ose-must-gather:latest created
[must-gather-jptfg] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-k27wx deleted
[must-gather      ] OUT namespace/openshift-must-gather-jnfzp deleted
error: gather did not start for pod must-gather-jptfg: timed out waiting for the condition

Comment 12 To Hung Sze 2021-01-13 18:48:50 UTC
I tried with 4.7.0-0.nightly-2021-01-13-124141 and it looks as if it completed:
oc get clusterversion -o json|jq ".items[0].status.history"
[
  {
    "completionTime": "2021-01-13T18:41:17Z",
    "image": "quay.io/openshift-release-dev/ocp-release:4.6.10-x86_64",
    "startedTime": "2021-01-13T18:15:47Z",
    "state": "Completed",
    "verified": false,
    "version": "4.6.10"
  },
  {
    "completionTime": "2021-01-13T18:13:02Z",
    "image": "registry.ci.openshift.org/ocp/release:4.7.0-0.nightly-2021-01-13-124141",
    "startedTime": "2021-01-13T17:14:07Z",
    "state": "Completed",
    "verified": false,
    "version": "4.7.0-0.nightly-2021-01-13-124141"
  },
  {
    "completionTime": "2021-01-13T16:10:14Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:c72bcb1a8c735770caedbab5d6142c2c07209fb4fbda5f59eb92fc88db7fe8cc",
    "startedTime": "2021-01-13T15:28:32Z",
    "state": "Completed",
    "verified": false,
    "version": "4.6.10"
  }
]


But some operators are still at 4.7.
$ ./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.10                              True        False         False      35m
baremetal                                  4.7.0-0.nightly-2021-01-13-124141   True        False         False      69m
cloud-credential                           4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h16m
cluster-autoscaler                         4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h11m
config-operator                            4.6.10                              True        False         False      3h12m
console                                    4.7.0-0.nightly-2021-01-13-124141   True        False         False      40m
csi-snapshot-controller                    4.7.0-0.nightly-2021-01-13-124141   True        False         False      53m
dns                                        4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h10m
etcd                                       4.6.10                              True        False         False      3h11m
image-registry                             4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h2m
ingress                                    4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h2m
insights                                   4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h12m
kube-apiserver                             4.6.10                              True        False         False      3h9m
kube-controller-manager                    4.6.10                              True        False         False      3h10m
kube-scheduler                             4.6.10                              True        False         False      3h9m
kube-storage-version-migrator              4.7.0-0.nightly-2021-01-13-124141   True        False         False      50m
machine-api                                4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h8m
machine-approver                           4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h11m
machine-config                             4.7.0-0.nightly-2021-01-13-124141   True        False         False      35m
marketplace                                4.7.0-0.nightly-2021-01-13-124141   True        False         False      46m
monitoring                                 4.7.0-0.nightly-2021-01-13-124141   True        False         False      178m
network                                    4.7.0-0.nightly-2021-01-13-124141   True        False         False      59m
node-tuning                                4.7.0-0.nightly-2021-01-13-124141   True        False         False      68m
openshift-apiserver                        4.6.10                              True        False         False      35m
openshift-controller-manager               4.6.10                              True        False         False      3h11m
openshift-samples                          4.7.0-0.nightly-2021-01-13-124141   True        False         False      68m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h11m
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h11m
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-01-13-124141   True        False         False      40m
service-ca                                 4.7.0-0.nightly-2021-01-13-124141   True        False         False      3h12m
storage                                    4.7.0-0.nightly-2021-01-13-124141   True        False         False      66m

Comment 13 To Hung Sze 2021-01-13 20:35:20 UTC
I have the must-gather if anyone is interested.

Comment 14 Jan Safranek 2021-01-15 08:57:26 UTC
In this BZ, I'm fixing storage ClusterOperator / AWS EBS CSI driver operator. Please file a separate BZ for other components that fail to downgrade - it's very likely they have a different root cause and different fix(es).

Comment 15 Qin Ping 2021-01-15 09:24:01 UTC
We created a new bug for the issue in comment #12: https://bugzilla.redhat.com/show_bug.cgi?id=1916586

Comment 16 Yang Yang 2021-01-26 03:41:03 UTC
I tried to upgrade from 4.6.13 -> 4.7.0-0.nightly-2021-01-21-090809 -> 4.6.13 on AWS but many operators are still not downgraded.

# oc get clusterversion -ojson | jq ".items[0].status.history"
[
  {
    "completionTime": "2021-01-25T09:43:22Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338",
    "startedTime": "2021-01-25T09:20:22Z",
    "state": "Completed",
    "verified": true,
    "version": "4.6.13"
  },
  {
    "completionTime": "2021-01-25T09:12:52Z",
    "image": "registry.ci.openshift.org/ocp/release@sha256:e4b26431d5eb4c994b07dc013855cb0c0a194bfff8c2eb6c79f500ef34bbc358",
    "startedTime": "2021-01-25T08:11:50Z",
    "state": "Completed",
    "verified": false,
    "version": "4.7.0-0.nightly-2021-01-21-090809"
  },
  {
    "completionTime": "2021-01-25T07:52:54Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338",
    "startedTime": "2021-01-25T07:19:24Z",
    "state": "Completed",
    "verified": false,
    "version": "4.6.13"
  }
]


# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.13                              True        False         False      18h
baremetal                                  4.7.0-0.nightly-2021-01-21-090809   True        False         False      19h
cloud-credential                           4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
cluster-autoscaler                         4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
config-operator                            4.6.13                              True        False         False      20h
console                                    4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
csi-snapshot-controller                    4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
dns                                        4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
etcd                                       4.6.13                              True        False         False      20h
image-registry                             4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
ingress                                    4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
insights                                   4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
kube-apiserver                             4.6.13                              True        False         False      20h
kube-controller-manager                    4.6.13                              True        False         False      20h
kube-scheduler                             4.6.13                              True        False         False      20h
kube-storage-version-migrator              4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
machine-api                                4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
machine-approver                           4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
machine-config                             4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
marketplace                                4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
monitoring                                 4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
network                                    4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
node-tuning                                4.7.0-0.nightly-2021-01-21-090809   True        False         False      19h
openshift-apiserver                        4.6.13                              True        False         False      19h
openshift-controller-manager               4.6.13                              True        False         False      20h
openshift-samples                          4.7.0-0.nightly-2021-01-21-090809   True        False         False      19h
operator-lifecycle-manager                 4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h
service-ca                                 4.7.0-0.nightly-2021-01-21-090809   True        False         False      20h
storage                                    4.7.0-0.nightly-2021-01-21-090809   True        False         False      18h

Comment 17 Jan Safranek 2021-01-26 09:01:54 UTC
Yang, can you please provide must-gather or any more information why is storage operator stuck at 4.7? It's odd that nothing is Progressing.

Comment 18 Yang Yang 2021-01-26 09:29:15 UTC
Jan, the issue was logged in bz1916586. There are must-gather logs there for your investigation.

Comment 20 Qin Ping 2021-02-05 05:45:03 UTC
Verified with: 4.7.0-fc.5

With 4.7.0-fc.5 the RBAC rules for comfigmap are:
$ oc describe role aws-ebs-csi-driver-operator-aws-config-role -n openshift-config-managed
Name:         aws-ebs-csi-driver-operator-aws-config-role
Labels:       <none>
Annotations:  <none>
PolicyRule:
  Resources   Non-Resource URLs  Resource Names  Verbs
  ---------   -----------------  --------------  -----
  configmaps  []                 []              [get list watch]

$ oc describe rolebinding -n openshift-config-managed  aws-ebs-csi-driver-operator-aws-config-clusterrolebinding
Name:         aws-ebs-csi-driver-operator-aws-config-clusterrolebinding
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  Role
  Name:  aws-ebs-csi-driver-operator-aws-config-role
Subjects:
  Kind            Name                         Namespace
  ----            ----                         ---------
  ServiceAccount  aws-ebs-csi-driver-operator  openshift-cluster-csi-drivers

$ oc describe clusterrole aws-ebs-csi-driver-operator-clusterrole|grep configmaps
  configmaps                                             []                 [aws-ebs-csi-driver-operator-lock]    [*]
  configmaps                                             []                 [extension-apiserver-authentication]  [*]


When downgrading to 4.7.0-fc.1, the RBAC rules are:
$ oc describe clusterrole aws-ebs-csi-driver-operator-clusterrole|grep configmaps
  configmaps                                             []                 [aws-ebs-csi-driver-operator-lock]    [*]
  configmaps                                             []                 [extension-apiserver-authentication]  [*]
  configmaps                                             []                 []                                    [get list watch]

$ oc describe role aws-ebs-csi-driver-operator-aws-config-role -n openshift-config-managed
Name:         aws-ebs-csi-driver-operator-aws-config-role
Labels:       <none>
Annotations:  <none>
PolicyRule:
  Resources   Non-Resource URLs  Resource Names  Verbs
  ---------   -----------------  --------------  -----
  configmaps  []                 []              [get list watch]

$ oc describe rolebinding -n openshift-config-managed  aws-ebs-csi-driver-operator-aws-config-clusterrolebinding
Name:         aws-ebs-csi-driver-operator-aws-config-clusterrolebinding
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  Role
  Name:  aws-ebs-csi-driver-operator-aws-config-role
Subjects:
  Kind            Name                         Namespace
  ----            ----                         ---------
  ServiceAccount  aws-ebs-csi-driver-operator  openshift-cluster-csi-drivers

The RBAC rules for configmap openshift-config-managed/kube-cloud-config is not changed.

Comment 23 errata-xmlrpc 2021-02-24 15:43:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.