Description of problem (please be detailed as possible and provide log snippests): upgrade failed from 4.13.1 to 4.14.0-61 Version of all relevant components (if applicable): openshift installer (4.14.0-0.nightly-2023-07-11-092038) upgrade from 4.13.1 to 4.14.0-61 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? 2/2 Can this issue reproduce from the UI? Not tried If this is a regression, please provide more details to justify this: Yes Steps to Reproduce: 1. install odf 4.13.1 and upgrade to 4.14.0-61 using ocs-ci 2. verify all csv's Actual results: $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.14.0-61.stable NooBaa Operator 4.14.0-61.stable mcg-operator.v4.13.1-rhodf Succeeded ocs-operator.v4.13.1-rhodf OpenShift Container Storage 4.13.1-rhodf ocs-operator.v4.13.0-rhodf Replacing ocs-operator.v4.14.0-61.stable OpenShift Container Storage 4.14.0-61.stable ocs-operator.v4.13.1-rhodf Failed odf-csi-addons-operator.v4.14.0-61.stable CSI Addons 4.14.0-61.stable odf-csi-addons-operator.v4.13.1-rhodf Succeeded odf-operator.v4.14.0-61.stable OpenShift Data Foundation 4.14.0-61.stable odf-operator.v4.13.1-rhodf Succeeded Expected results: All csv's should in Suceeded state. Additional info: $ oc describe csv ocs-operator.v4.14.0-61.stable Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RequirementsUnknown 12m operator-lifecycle-manager requirements not yet checked Normal RequirementsNotMet 11m operator-lifecycle-manager one or more requirements couldn't be found Normal InstallWaiting 11m operator-lifecycle-manager installing: waiting for deployment ocs-operator to become ready: deployment "ocs-operator" not available: Deployment does not have minimum availability. Warning InstallCheckFailed 6m34s (x2 over 6m34s) operator-lifecycle-manager install timeout Normal NeedsReinstall 6m33s (x2 over 6m34s) operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: deployment "rook-ceph-operator" not available: Deployment does not have minimum availability. Normal AllRequirementsMet 6m30s (x4 over 11m) operator-lifecycle-manager all requirements found, attempting install Normal InstallSucceeded 6m30s (x2 over 11m) operator-lifecycle-manager waiting for install components to report healthy Normal InstallWaiting 6m29s (x3 over 11m) operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: deployment "rook-ceph-operator" not available: Deployment does not have minimum availability. Warning InstallCheckFailed 92s (x2 over 92s) operator-lifecycle-manager install failed: deployment rook-ceph-operator not ready before timeout: deployment "rook-ceph-operator" exceeded its progress deadline > pod in not running state $ oc get pods | egrep -v "Running|Completed" NAME READY STATUS RESTARTS AGE noobaa-core-0 0/1 CrashLoopBackOff 7 (25s ago) 11m rook-ceph-operator-6fd47df694-gwtqz 0/1 CreateContainerConfigError 0 12m > $ oc describe pod rook-ceph-operator-6fd47df694-gwtqz Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 17m default-scheduler Successfully assigned openshift-storage/rook-ceph-operator-6fd47df694-gwtqz to compute-1 Normal AddedInterface 17m multus Add eth0 [10.131.0.47/23] from ovn-kubernetes Normal Pulling 17m kubelet Pulling image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:d19bd025bd17d8db3f918ed8ef65188a4a1d58f7756bb15d7e0504ee5fcf26cb" Normal Pulled 16m kubelet Successfully pulled image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:d19bd025bd17d8db3f918ed8ef65188a4a1d58f7756bb15d7e0504ee5fcf26cb" in 15.950938092s (15.950952025s including waiting) Warning Failed 14m (x12 over 16m) kubelet Error: couldn't find key CSI_ENABLE_TOPOLOGY in ConfigMap openshift-storage/ocs-operator-config Normal Pulled 112s (x71 over 16m) kubelet Container image "registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:d19bd025bd17d8db3f918ed8ef65188a4a1d58f7756bb15d7e0504ee5fcf26cb" already present on machine > $ oc describe pod noobaa-core-0 Name: noobaa-core-0 Namespace: openshift-storage Priority: 0 Service Account: noobaa Node: compute-2/10.1.112.178 Start Time: Wed, 12 Jul 2023 12:42:39 +0530 Labels: app=noobaa controller-revision-hash=noobaa-core-5656895cf5 noobaa-core=noobaa noobaa-mgmt=noobaa statefulset.kubernetes.io/pod-name=noobaa-core-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16m default-scheduler Successfully assigned openshift-storage/noobaa-core-0 to compute-2 Normal AddedInterface 16m multus Add eth0 [10.128.2.35/23] from ovn-kubernetes Normal Pulled 15m (x5 over 16m) kubelet Container image "registry.redhat.io/odf4/mcg-core-rhel9@sha256:b51a63bc588431acc0306703a99562e7c2c35266bf90d16b69146944911728cd" already present on machine Normal Created 15m (x5 over 16m) kubelet Created container core Normal Started 15m (x5 over 16m) kubelet Started container core Warning BackOff 93s (x70 over 16m) kubelet Back-off restarting failed container core in pod noobaa-core-0_openshift-storage(48b662a5-ce7c-4552-b7c9-7c197b852268) job still running and will update once muster gather is collected. kubeconfig is provided to dev for live debugging job: https://url.corp.redhat.com/c447398
> noobaa-core-0 pod log Jul-12 7:28:57.433 [Upgrade/20] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:414:41 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:71:90) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) Jul-12 7:28:57.433 [Upgrade/20] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:414:41 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:71:90) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) Jul-12 7:28:57.434 [Upgrade/20] [ERROR] UPGRADE:: failed to load system store!! Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:414:41 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:71:90) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) Jul-12 7:28:57.434 [Upgrade/20] [ERROR] UPGRADE:: failed to init upgrade process!! upgrade_manager failed with exit code 1 noobaa_init.sh finished noobaa_init failed with exit code 1. aborting
We looked at the setup and found the version mismatch error "Storage cluster version (4.13.1) is higher than the OCS Operator version (4.13.0)" This is due to the PR merged a few days ago https://github.com/red-hat-storage/ocs-operator/pull/2089. We need to update the downstream docker file to use the pkg's updated path while using the ldflags to update the version. We already notified Boris. He will make a fix in the downstream docker file. It will solve the rook issue. for the noobaa issue, someone needs to take a look from the noobaa team. I am moving it to the noobaa component.