Bug 1807615 - [4.6] Upgrade OCP 4.3.3. to OCP 4.4.0 with alpha snapshot CRDs should print better error message
Summary: [4.6] Upgrade OCP 4.3.3. to OCP 4.4.0 with alpha snapshot CRDs should print b...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Jan Safranek
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks: 1845433
TreeView+ depends on / blocked
 
Reported: 2020-02-26 18:12 UTC by Guy Inger
Modified: 2023-10-06 19:17 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 15:55:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-csi-snapshot-controller-operator pull 40 0 None closed Bug 1807615: Add extra check for v1alpha CRD 2021-02-18 06:21:38 UTC
Red Hat Knowledge Base (Solution) 5069531 0 None None None 2020-06-02 14:44:51 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:56:10 UTC

Description Guy Inger 2020-02-26 18:12:07 UTC
Description of problem:
was trying to upgrade an 4.3.3 cluster to 4.4.0 (which is supported and works according to this https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836 ), but it filed.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy a 4.3.3 cluster
2. Execute "https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836"
3.

Actual results:
OCP is not being upgraded

Expected results:
OCP upgraded to 4.4.0

Additional info:
oc get clusterversion says "Unable to apply 4.4.0-0.nightly-2020-02-26-073836: the cluster operator csi-snapshot-controller has not yet successfully rolled out"
[cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
cloud-credential                           4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
cluster-autoscaler                         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
console                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      36m
csi-snapshot-controller                                                        Unknown     Unknown       True       74m
dns                                        4.3.3                               True        False         False      9h
etcd                                       4.4.0-0.nightly-2020-02-26-073836   True        False         False      86m
image-registry                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
ingress                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
insights                                   4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
kube-apiserver                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
kube-controller-manager                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      82m
kube-scheduler                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      82m
kube-storage-version-migrator              4.4.0-0.nightly-2020-02-26-073836   True        False         False      75m
machine-api                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
machine-config                             4.3.3                               True        False         False      9h
marketplace                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m
monitoring                                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      43m
network                                    4.3.3                               True        False         False      9h
node-tuning                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m
openshift-apiserver                        4.4.0-0.nightly-2020-02-26-073836   True        False         False      79m
openshift-controller-manager               4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
openshift-samples                          4.4.0-0.nightly-2020-02-26-073836   True        False         False      28m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-02-26-073836   True        False         False      37m
service-ca                                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
service-catalog-apiserver                  4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
service-catalog-controller-manager         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
storage                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m



If any additional info is needed please let me know.

Thanks

Comment 1 Guy Inger 2020-02-27 09:29:36 UTC
[cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc describe co csi-snapshot-controller
Name:         csi-snapshot-controller
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-02-26T16:55:17Z
  Generation:          1
  Resource Version:    331744
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/csi-snapshot-controller
  UID:                 01b8335a-d68a-45b8-9c44-61723a887d40
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-02-26T16:57:17Z
    Message:               Degraded: failed to sync CRDs: CustomResourceDefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions
    Reason:                _OperatorSync
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Progressing
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Available
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:     
    Name:      openshift-csi-snapshot-controller
    Resource:  namespaces
    Group:     
    Name:      openshift-csi-snapshot-controller-operator
    Resource:  namespaces
    Group:     operator.openshift.io
    Name:      cluster
    Resource:  csisnapshotcontrollers
Events:        <none>

Comment 4 Tareq Alayan 2020-03-03 09:01:55 UTC
Hi Adam, 
IS the volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions

something that cnv bring ? and then it can break the ocp upgrade?

Comment 5 Tareq Alayan 2020-03-03 09:44:11 UTC
@Jan Safranek, I think v1alpha1 version of volumesnapshots CRD should be fixed to be using the beta api version of k8s, this will be done by the cnv developer.
However i think that ocp shouldn't fail because of that. What do you think?

Comment 12 Nelly Credi 2020-03-10 12:24:04 UTC
@Guy, can you please try to first upgrade CNV2.2 to CNV2.3 and after that try to upgrade OCP4.3 to OCP4.4?

Comment 14 Guy Inger 2020-03-12 13:55:43 UTC
Hey Adam, I tried what you suggested and I deploy a 4.3.0 without any storage classes.
I first tried to upgrade it to 4.3.5 (oc adm upgrade --force=true  --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64), which failed.
oc get clusterversion returns "Unable to apply 4.3.5: the cluster operator monitoring has not yet successfully rolled out".

More info:

[cnv-qe-jenkins@cnv-executor-ginger1 ~]$ oc describe co monitoring
Name:         monitoring
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-03-11T14:56:07Z
  Generation:          1
  Resource Version:    586313
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/monitoring
  UID:                 323dbdde-fe28-496b-a704-f2a74c50d1fe
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-03-12T11:14:59Z
    Message:               Failed to rollout the stack. Error: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 6, updated: 1, ready: 6, unavailable: 0)
    Reason:                UpdatingnodeExporterFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-03-12T13:52:29Z
    Message:               Rollout of the monitoring stack is in progress. Please wait until it finishes.
    Reason:                RollOutInProgress
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2020-03-12T11:14:59Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-03-12T13:52:29Z
    Message:               Rolling out the stack.
    Reason:                RollOutInProgress
    Status:                True
    Type:                  Progressing
  Extension:               <nil>
  Related Objects:
    Group:     
    Name:      openshift-monitoring
    Resource:  namespaces
    Group:     
    Name:      openshift-monitoring
    Resource:  all
    Group:     monitoring.coreos.com
    Name:      
    Resource:  servicemonitors
    Group:     monitoring.coreos.com
    Name:      
    Resource:  prometheusrules
    Group:     monitoring.coreos.com
    Name:      
    Resource:  alertmanagers
    Group:     monitoring.coreos.com
    Name:      
    Resource:  prometheuses
  Versions:
    Name:     operator
    Version:  4.3.0-0.nightly-2020-03-09-200240
Events:       <none>

Comment 15 Adam Litke 2020-03-12 14:07:17 UTC
Ok that is a completely different error not related to storage at all.  Let's see if we can get a successful upgrade without any storage classes and then we'll have this isolated a bit more.  In the meantime I would not promote this to a blocker since our official HCO installation does not configure the Snapshot alpha components which were blocking your upgrade tests.

Comment 16 Guy Inger 2020-03-17 15:00:41 UTC
I tried upgrading 4.3.5 to 4.4 which worked. There might still be an issue with upgrading from 4.3.3 to 4.4.

Comment 23 errata-xmlrpc 2020-10-27 15:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.