Description of problem (please be detailed as possible and provide log snippests): On upgrading ODF fom 4.14 to 4.15, noobaa-operator panics and going to CLBO ~~~ noobaa-operator-55954db6bf-4lrwv 0/1 Running 231 21h ~~~ The operator crashed during upgrading db image and running /init/dumpdb.sh less namespaces/openshift-storage/pods/noobaa-operator-55954db6bf-4lrwv/noobaa-operator/noobaa-operator/logs/current.log ~~~ 2024-05-16T10:12:41.338813083Z time="2024-05-16T10:12:41Z" level=info msg="UpgradePostgresDB: current phase is Preparing" sys=openshift-storage/noobaa 2024-05-16T10:12:41.338813083Z time="2024-05-16T10:12:41Z" level=info msg="SetEndpointsDeploymentReplicas:: setting endpoints replica count to 0" sys=openshift-storage/noobaa 2024-05-16T10:12:41.339969703Z time="2024-05-16T10:12:41Z" level=info msg="ReconcileObject: Done - unchanged Deployment noobaa-endpoint " sys=openshift-storage/noobaa 2024-05-16T10:12:41.339969703Z time="2024-05-16T10:12:41Z" level=info msg="ReconcileSetDbImageAndInitCode:: changing DB image: registry.redhat.io/rhel8/postgresql-12@sha256:16a0cb66818ab8acb68abf40ac075eadd10a94612067769e055222dd412f0a16 and init contatiners script: /init/dumpdb.sh" sys=openshift-storage/noobaa 2024-05-16T10:12:41.343294392Z panic: runtime error: index out of range [0] with length 0 [recovered] 2024-05-16T10:12:41.343294392Z panic: runtime error: index out of range [0] with length 0 2024-05-16T10:12:41.343294392Z 2024-05-16T10:12:41.343294392Z goroutine 3003 [running]: 2024-05-16T10:12:41.343294392Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() 2024-05-16T10:12:41.343321837Z /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:116 +0x1fa 2024-05-16T10:12:41.343321837Z panic({0x25b28c0, 0xc001d03728}) 2024-05-16T10:12:41.343321837Z /usr/lib/golang/src/runtime/panic.go:884 +0x213 2024-05-16T10:12:41.343321837Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcileSetDbImageAndInitCode.func1() 2024-05-16T10:12:41.343334234Z /remote-source/app/pkg/system/phase2_creating.go:1519 +0x336 2024-05-16T10:12:41.343334234Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).reconcileObjectAndGetResult.func1() 2024-05-16T10:12:41.343334234Z /remote-source/app/pkg/system/reconciler.go:639 +0x22 2024-05-16T10:12:41.343357633Z sigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate(0xc000c9f900?, {{0xc0014cc168?, 0x0?}, {0xc0015119b0?, 0x2d7bb40?}}, {0x2d9a500, 0xc000c9f900}) 2024-05-16T10:12:41.343378459Z /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/controller/controllerutil/controllerutil.go:340 +0x4f 2024-05-16T10:12:41.343388804Z sigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate({0x2d7bb40, 0xc000056058}, {0x2d89e40, 0xc000d0ed80}, {0x2d9a500?, 0xc000c9f900}, 0xc0001e4b10?) 2024-05-16T10:12:41.343410134Z /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/controller/controllerutil/controllerutil.go:212 +0x274 2024-05-16T10:12:41.343421242Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).reconcileObjectAndGetResult(0xc00180a780, {0x2d9a500, 0xc000c9f900}, 0xc001af3550, 0x0) 2024-05-16T10:12:41.343431520Z /remote-source/app/pkg/system/reconciler.go:636 +0x169 2024-05-16T10:12:41.343442147Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).reconcileObject(...) 2024-05-16T10:12:41.343442147Z /remote-source/app/pkg/system/reconciler.go:627 2024-05-16T10:12:41.343442147Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcileObject(...) 2024-05-16T10:12:41.343442147Z /remote-source/app/pkg/system/reconciler.go:618 2024-05-16T10:12:41.343452838Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcileSetDbImageAndInitCode(0xc00180a780, {0xc001394150, 0x6e}, {0x27627dd, 0xf}, 0x1) 2024-05-16T10:12:41.343470896Z /remote-source/app/pkg/system/phase2_creating.go:1515 +0x186 2024-05-16T10:12:41.343481167Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).UpgradePostgresDB(0xc00180a780) 2024-05-16T10:12:41.343481167Z /remote-source/app/pkg/system/phase2_creating.go:1642 +0xe51 2024-05-16T10:12:41.343491532Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcilePhaseCreatingForMainClusters(0xc00180a780) 2024-05-16T10:12:41.343491532Z /remote-source/app/pkg/system/phase2_creating.go:138 +0x465 2024-05-16T10:12:41.343503080Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcilePhaseCreating(0xc00180a780) 2024-05-16T10:12:41.343503080Z /remote-source/app/pkg/system/phase2_creating.go:66 +0x1e5 2024-05-16T10:12:41.343513196Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).ReconcilePhases(0x27f95ed?) 2024-05-16T10:12:41.343513196Z /remote-source/app/pkg/system/reconciler.go:541 +0x47 2024-05-16T10:12:41.343523620Z github.com/noobaa/noobaa-operator/v5/pkg/system.(*Reconciler).Reconcile(0xc00180a780) 2024-05-16T10:12:41.343523620Z /remote-source/app/pkg/system/reconciler.go:422 +0x33b 2024-05-16T10:12:41.343534342Z github.com/noobaa/noobaa-operator/v5/pkg/controller/noobaa.Add.func1({0xc001323ce0?, 0x40dd8a?}, {{{0xc0014cc168?, 0x30?}, {0xc001f2a316?, 0x2595ae0?}}}) 2024-05-16T10:12:41.343564901Z /remote-source/app/pkg/controller/noobaa/noobaa_controller.go:53 +0xe5 ~~~ The operator is running on 4.15 image, however postgres is still not on updated image. noobaa-db pod is running on below image while nooaba-operator is trying to update db image to postgresql-12@sha256:16a0cb66818ab8acb68abf40ac075eadd10a94612067769e055222dd412f0a16 ~~~ image: registry.redhat.io/rhel8/postgresql-12@sha256:b96be9d3e8512046bae7d5a3e04fa151043eca051416305629b3ffd547370453 ~~~ MCG csv is in installng state: cat namespaces/openshift-storage/oc_output/csv ~~~ NAME DISPLAY VERSION REPLACES PHASE gitlab-runner-operator.v1.21.0 GitLab Runner 1.21.0 gitlab-runner-operator.v1.18.1 Succeeded mcg-operator.v4.15.2-rhodf NooBaa Operator 4.15.2-rhodf mcg-operator.v4.14.6-rhodf Installing ocs-operator.v4.15.2-rhodf OpenShift Container Storage 4.15.2-rhodf ocs-operator.v4.14.6-rhodf Succeeded odf-csi-addons-operator.v4.15.2-rhodf CSI Addons 4.15.2-rhodf odf-csi-addons-operator.v4.14.6-rhodf Succeeded odf-operator.v4.15.2-rhodf OpenShift Data Foundation 4.15.2-rhodf odf-operator.v4.14.6-rhodf Succeeded ~~~ less namespaces/openshift-storage/operators.coreos.com/clusterserviceversions/mcg-operator.v4.15.2-rhodf.yaml ~~~ - lastTransitionTime: "2024-05-16T10:06:28Z" lastUpdateTime: "2024-05-16T10:06:28Z" message: 'installing: waiting for deployment noobaa-operator to become ready: deployment "noobaa-operator" not available: Deployment does not have minimum availability.' phase: Installing reason: InstallWaiting ~~~ init container is missing in the noobaa-db pod. $omg get pod noobaa-db-pg-0 -o yaml|grep -i initContainers $ From dumpdb.sh, I see it checks if use space of /var/lib/pgsql/data is greater than THRESHOLD (33%) ~~~ cat /init/dumpdb.sh set -e sed -i -e 's/^\(postgres:[^:]\):[0-9]*:[0-9]*:/\1:10001:0:/' /etc/passwd su postgres -c "bash -x /usr/bin/run-postgresql" & THRESHOLD=33 USE=$(df -h --output=pcent "/$HOME/data" | tail -n 1 | tr -d '[:space:]%') # Check if the used space is more than the threshold if [ "$USE" -gt "$THRESHOLD" ]; then echo "Warning: Free space $USE% is above $THRESHOLD% threshold. Can't start upgrade!" exit 1 fi echo "Info: Free space $USE% is below $THRESHOLD% threshold. Starting upgrade!" until pg_isready; do sleep 1; done; pg_dumpall -U postgres > /$HOME/data/dump.sql exit 0 ~~~ The /var/lib/pgsql is only 2% USE, which is less than threshold. ~~~ $ oc rsh noobaa-db-pg-0 sh-4.4$ df -h /var/lib/pgsql/data Filesystem Size Used Avail Use% Mounted on /dev/rbd0 49G 776M 49G 2% /var/lib/pgsql ~~~ Version of all relevant components (if applicable): ODF 4.15.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This is causing the noobaa service to be down. This is also preventing customer to upgrade ODF to 4.15 in other clusters. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes, in customer environment Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: The upgrade from 4.14 to 4.15 worked fine in another cluster. Customer suspect this is a regression bug that only comes into play on "old" ODF-clusters. This cluster was first installed with version 4.4. Steps to Reproduce: 1. Install ODF < 4.14 2. Upgrade ODF from 4.14 to 4.15 Actual results: noobaa-operator panic on upgrade Expected results: ODF upgrade should go smooth Additional info:
Customer ran the steps in c#7 and resolved their issue: ~~~~~~ We just added the initContainer to the noobaa-db-pg StatefulSet and that made the noobaa-operator happy again. The upgrade of noobaa then started again and finished successfully. ~~~~~~
*** Bug 2281604 has been marked as a duplicate of this bug. ***
Hi Nimrod, The customer is waiting for the fix. Can you please prioritize this bug? Regards, Sonal Arora
Please backport the fix to ODF-4.15 and update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.15.6 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:6397
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days