Bug 1807615

Summary:	[4.6] Upgrade OCP 4.3.3. to OCP 4.4.0 with alpha snapshot CRDs should print better error message
Product:	OpenShift Container Platform	Reporter:	Guy Inger <ginger>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Storage sub component:	Operators	QA Contact:	Chao Yang <chaoyang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	alitke, andcosta, aos-bugs, cnv-qe-bugs, danken, eparis, fdeutsch, ginger, jokerman, jsafrane, lxia, ncredi, ngavrilo, sreber, talayan
Version:	4.4	Keywords:	Reopened, Upgrades
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 15:55:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1845433

Description Guy Inger 2020-02-26 18:12:07 UTC

Description of problem:
was trying to upgrade an 4.3.3 cluster to 4.4.0 (which is supported and works according to this https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836 ), but it filed.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy a 4.3.3 cluster
2. Execute "https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836"
3.

Actual results:
OCP is not being upgraded

Expected results:
OCP upgraded to 4.4.0

Additional info:
oc get clusterversion says "Unable to apply 4.4.0-0.nightly-2020-02-26-073836: the cluster operator csi-snapshot-controller has not yet successfully rolled out"
[cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
cloud-credential                           4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
cluster-autoscaler                         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
console                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      36m
csi-snapshot-controller                                                        Unknown     Unknown       True       74m
dns                                        4.3.3                               True        False         False      9h
etcd                                       4.4.0-0.nightly-2020-02-26-073836   True        False         False      86m
image-registry                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
ingress                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
insights                                   4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
kube-apiserver                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
kube-controller-manager                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      82m
kube-scheduler                             4.4.0-0.nightly-2020-02-26-073836   True        False         False      82m
kube-storage-version-migrator              4.4.0-0.nightly-2020-02-26-073836   True        False         False      75m
machine-api                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
machine-config                             4.3.3                               True        False         False      9h
marketplace                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m
monitoring                                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      43m
network                                    4.3.3                               True        False         False      9h
node-tuning                                4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m
openshift-apiserver                        4.4.0-0.nightly-2020-02-26-073836   True        False         False      79m
openshift-controller-manager               4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
openshift-samples                          4.4.0-0.nightly-2020-02-26-073836   True        False         False      28m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-02-26-073836   True        False         False      37m
service-ca                                 4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
service-catalog-apiserver                  4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
service-catalog-controller-manager         4.4.0-0.nightly-2020-02-26-073836   True        False         False      9h
storage                                    4.4.0-0.nightly-2020-02-26-073836   True        False         False      38m



If any additional info is needed please let me know.

Thanks

Comment 1 Guy Inger 2020-02-27 09:29:36 UTC

[cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc describe co csi-snapshot-controller
Name:         csi-snapshot-controller
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-02-26T16:55:17Z
  Generation:          1
  Resource Version:    331744
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/csi-snapshot-controller
  UID:                 01b8335a-d68a-45b8-9c44-61723a887d40
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-02-26T16:57:17Z
    Message:               Degraded: failed to sync CRDs: CustomResourceDefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions
    Reason:                _OperatorSync
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Progressing
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Available
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:     
    Name:      openshift-csi-snapshot-controller
    Resource:  namespaces
    Group:     
    Name:      openshift-csi-snapshot-controller-operator
    Resource:  namespaces
    Group:     operator.openshift.io
    Name:      cluster
    Resource:  csisnapshotcontrollers
Events:        <none>

Comment 4 Tareq Alayan 2020-03-03 09:01:55 UTC

Hi Adam, 
IS the volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions

something that cnv bring ? and then it can break the ocp upgrade?

Comment 5 Tareq Alayan 2020-03-03 09:44:11 UTC

@Jan Safranek, I think v1alpha1 version of volumesnapshots CRD should be fixed to be using the beta api version of k8s, this will be done by the cnv developer.
However i think that ocp shouldn't fail because of that. What do you think?

Comment 12 Nelly Credi 2020-03-10 12:24:04 UTC

@Guy, can you please try to first upgrade CNV2.2 to CNV2.3 and after that try to upgrade OCP4.3 to OCP4.4?

Comment 14 Guy Inger 2020-03-12 13:55:43 UTC

Hey Adam, I tried what you suggested and I deploy a 4.3.0 without any storage classes.
I first tried to upgrade it to 4.3.5 (oc adm upgrade --force=true  --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64), which failed.
oc get clusterversion returns "Unable to apply 4.3.5: the cluster operator monitoring has not yet successfully rolled out".

More info:

[cnv-qe-jenkins@cnv-executor-ginger1 ~]$ oc describe co monitoring
Name:         monitoring
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-03-11T14:56:07Z
  Generation:          1
  Resource Version:    586313
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/monitoring
  UID:                 323dbdde-fe28-496b-a704-f2a74c50d1fe
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-03-12T11:14:59Z
    Message:               Failed to rollout the stack. Error: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 6, updated: 1, ready: 6, unavailable: 0)
    Reason:                UpdatingnodeExporterFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-03-12T13:52:29Z
    Message:               Rollout of the monitoring stack is in progress. Please wait until it finishes.
    Reason:                RollOutInProgress
    Status:                True
    Type:                  Upgradeable
    Last Transition Time:  2020-03-12T11:14:59Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-03-12T13:52:29Z
    Message:               Rolling out the stack.
    Reason:                RollOutInProgress
    Status:                True
    Type:                  Progressing
  Extension:               <nil>
  Related Objects:
    Group:     
    Name:      openshift-monitoring
    Resource:  namespaces
    Group:     
    Name:      openshift-monitoring
    Resource:  all
    Group:     monitoring.coreos.com
    Name:      
    Resource:  servicemonitors
    Group:     monitoring.coreos.com
    Name:      
    Resource:  prometheusrules
    Group:     monitoring.coreos.com
    Name:      
    Resource:  alertmanagers
    Group:     monitoring.coreos.com
    Name:      
    Resource:  prometheuses
  Versions:
    Name:     operator
    Version:  4.3.0-0.nightly-2020-03-09-200240
Events:       <none>

Comment 15 Adam Litke 2020-03-12 14:07:17 UTC

Ok that is a completely different error not related to storage at all.  Let's see if we can get a successful upgrade without any storage classes and then we'll have this isolated a bit more.  In the meantime I would not promote this to a blocker since our official HCO installation does not configure the Snapshot alpha components which were blocking your upgrade tests.

Comment 16 Guy Inger 2020-03-17 15:00:41 UTC

I tried upgrading 4.3.5 to 4.4 which worked. There might still be an issue with upgrading from 4.3.3 to 4.4.

Comment 23 errata-xmlrpc 2020-10-27 15:55:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196