Description of problem: was trying to upgrade an 4.3.3 cluster to 4.4.0 (which is supported and works according to this https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836 ), but it filed. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Deploy a 4.3.3 cluster 2. Execute "https://openshift-release.svc.ci.openshift.org/releasestream/4.4.0-0.nightly/release/4.4.0-0.nightly-2020-02-26-073836" 3. Actual results: OCP is not being upgraded Expected results: OCP upgraded to 4.4.0 Additional info: oc get clusterversion says "Unable to apply 4.4.0-0.nightly-2020-02-26-073836: the cluster operator csi-snapshot-controller has not yet successfully rolled out" [cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-02-26-073836 True False False 9h cloud-credential 4.4.0-0.nightly-2020-02-26-073836 True False False 9h cluster-autoscaler 4.4.0-0.nightly-2020-02-26-073836 True False False 9h console 4.4.0-0.nightly-2020-02-26-073836 True False False 36m csi-snapshot-controller Unknown Unknown True 74m dns 4.3.3 True False False 9h etcd 4.4.0-0.nightly-2020-02-26-073836 True False False 86m image-registry 4.4.0-0.nightly-2020-02-26-073836 True False False 9h ingress 4.4.0-0.nightly-2020-02-26-073836 True False False 9h insights 4.4.0-0.nightly-2020-02-26-073836 True False False 9h kube-apiserver 4.4.0-0.nightly-2020-02-26-073836 True False False 9h kube-controller-manager 4.4.0-0.nightly-2020-02-26-073836 True False False 82m kube-scheduler 4.4.0-0.nightly-2020-02-26-073836 True False False 82m kube-storage-version-migrator 4.4.0-0.nightly-2020-02-26-073836 True False False 75m machine-api 4.4.0-0.nightly-2020-02-26-073836 True False False 9h machine-config 4.3.3 True False False 9h marketplace 4.4.0-0.nightly-2020-02-26-073836 True False False 38m monitoring 4.4.0-0.nightly-2020-02-26-073836 True False False 43m network 4.3.3 True False False 9h node-tuning 4.4.0-0.nightly-2020-02-26-073836 True False False 38m openshift-apiserver 4.4.0-0.nightly-2020-02-26-073836 True False False 79m openshift-controller-manager 4.4.0-0.nightly-2020-02-26-073836 True False False 9h openshift-samples 4.4.0-0.nightly-2020-02-26-073836 True False False 28m operator-lifecycle-manager 4.4.0-0.nightly-2020-02-26-073836 True False False 9h operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-02-26-073836 True False False 9h operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-02-26-073836 True False False 37m service-ca 4.4.0-0.nightly-2020-02-26-073836 True False False 9h service-catalog-apiserver 4.4.0-0.nightly-2020-02-26-073836 True False False 9h service-catalog-controller-manager 4.4.0-0.nightly-2020-02-26-073836 True False False 9h storage 4.4.0-0.nightly-2020-02-26-073836 True False False 38m If any additional info is needed please let me know. Thanks
[cnv-qe-jenkins@cnv-executor-ginger4 ~]$ oc describe co csi-snapshot-controller Name: csi-snapshot-controller Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-02-26T16:55:17Z Generation: 1 Resource Version: 331744 Self Link: /apis/config.openshift.io/v1/clusteroperators/csi-snapshot-controller UID: 01b8335a-d68a-45b8-9c44-61723a887d40 Spec: Status: Conditions: Last Transition Time: 2020-02-26T16:57:17Z Message: Degraded: failed to sync CRDs: CustomResourceDefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions Reason: _OperatorSync Status: True Type: Degraded Last Transition Time: 2020-02-26T16:55:17Z Reason: NoData Status: Unknown Type: Progressing Last Transition Time: 2020-02-26T16:55:17Z Reason: NoData Status: Unknown Type: Available Last Transition Time: 2020-02-26T16:55:17Z Reason: NoData Status: Unknown Type: Upgradeable Extension: <nil> Related Objects: Group: Name: openshift-csi-snapshot-controller Resource: namespaces Group: Name: openshift-csi-snapshot-controller-operator Resource: namespaces Group: operator.openshift.io Name: cluster Resource: csisnapshotcontrollers Events: <none>
Hi Adam, IS the volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions something that cnv bring ? and then it can break the ocp upgrade?
@Jan Safranek, I think v1alpha1 version of volumesnapshots CRD should be fixed to be using the beta api version of k8s, this will be done by the cnv developer. However i think that ocp shouldn't fail because of that. What do you think?
@Guy, can you please try to first upgrade CNV2.2 to CNV2.3 and after that try to upgrade OCP4.3 to OCP4.4?
Hey Adam, I tried what you suggested and I deploy a 4.3.0 without any storage classes. I first tried to upgrade it to 4.3.5 (oc adm upgrade --force=true --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64), which failed. oc get clusterversion returns "Unable to apply 4.3.5: the cluster operator monitoring has not yet successfully rolled out". More info: [cnv-qe-jenkins@cnv-executor-ginger1 ~]$ oc describe co monitoring Name: monitoring Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-03-11T14:56:07Z Generation: 1 Resource Version: 586313 Self Link: /apis/config.openshift.io/v1/clusteroperators/monitoring UID: 323dbdde-fe28-496b-a704-f2a74c50d1fe Spec: Status: Conditions: Last Transition Time: 2020-03-12T11:14:59Z Message: Failed to rollout the stack. Error: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 6, updated: 1, ready: 6, unavailable: 0) Reason: UpdatingnodeExporterFailed Status: True Type: Degraded Last Transition Time: 2020-03-12T13:52:29Z Message: Rollout of the monitoring stack is in progress. Please wait until it finishes. Reason: RollOutInProgress Status: True Type: Upgradeable Last Transition Time: 2020-03-12T11:14:59Z Status: False Type: Available Last Transition Time: 2020-03-12T13:52:29Z Message: Rolling out the stack. Reason: RollOutInProgress Status: True Type: Progressing Extension: <nil> Related Objects: Group: Name: openshift-monitoring Resource: namespaces Group: Name: openshift-monitoring Resource: all Group: monitoring.coreos.com Name: Resource: servicemonitors Group: monitoring.coreos.com Name: Resource: prometheusrules Group: monitoring.coreos.com Name: Resource: alertmanagers Group: monitoring.coreos.com Name: Resource: prometheuses Versions: Name: operator Version: 4.3.0-0.nightly-2020-03-09-200240 Events: <none>
Ok that is a completely different error not related to storage at all. Let's see if we can get a successful upgrade without any storage classes and then we'll have this isolated a bit more. In the meantime I would not promote this to a blocker since our official HCO installation does not configure the Snapshot alpha components which were blocking your upgrade tests.
I tried upgrading 4.3.5 to 4.4 which worked. There might still be an issue with upgrading from 4.3.3 to 4.4.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196