Description of problem: After performing 4.14 to 4.16 EUS to EUS upgrade procedure of ODF and OCP, and performing functional level MCG related operations (tier1 tests), noobaa-db-pg-0 got in CrashLoopBackOff Version of all relevant components (if applicable): Initial versions installed: ODF 4.14 + OCP 4.14 Upgraded versions: OCP 4.16.0-0.nightly-2024-05-15-001800 odf-operator.v4.16.0-101.stable Is there any workaround available to the best of your knowledge? Not aware Can this issue be reproducible? Tried this procedure with tier1 test execution only once so far Can this issue be reproduced from the UI? N/A If this is a regression, please provide more details to justify this: Hard to say if it's a regression at this point, as this is the first time we are trying EUS to EUS upgrade Steps to Reproduce: On IBM Cloud VPC: 1. Install OCP 4.14 (IPI), ODF 4.14 2. Upgrade ODF to 4.14, sequentially (4.14 to 4.15, 4.15 to 4.16) 3. Perform OCP EUS to EUS upgrade procedure: # oc patch clusterversions/version -p '{"spec":{"channel":"stable-4.16"}}' --type=merge # oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}' # oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-05-15-001800 --allow-explicit-upgrade --force # oc patch mcp/worker --type merge --patch '{"spec":{"paused":false}}' Perform functional level operations of NooBaa (tier1 tests of ocs-ci) Actual results: As part of the setup phase of test_bucket_creation_deletion.py::TestBucketCreationAndDeletion::test_bucket_creation_deletion[3-S3-DEFAULT-BACKINGSTORE], where the below resources have been created, noobaa-db-pg-0 started crashing. 2024-05-16 14:43:24 tests/functional/object/mcg/test_bucket_creation_deletion.py::TestBucketCreationAndDeletion::test_bucket_creation_deletion[3-S3-DEFAULT-BACKINGSTORE] 2024-05-16 14:43:24 -------------------------------- live log setup -------------------------------- AWS CLI configMap, s3cli StatefulSet, and AWS, Azure, GCP and IBM Cloud COS secrets. Started seeing connection to the DB getting broken: 2024-05-16 14:55:52 07:55:40 - ThreadPoolExecutor-30_0 - ocs_ci.utility.utils - INFO - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-storage rsh noobaa-db-pg-0 bash -c "pg_dump nbcore | gzip > /tmp/nbcore.gz" 2024-05-16 14:55:52 07:55:41 - ThreadPoolExecutor-30_0 - ocs_ci.utility.utils - WARNING - Command stderr: error: unable to upgrade connection: container not found ("db") 2024-05-16 14:55:52 2024-05-16 14:55:52 07:55:41 - ThreadPoolExecutor-30_0 - ocs_ci.ocs.utils - ERROR - Failed to dump noobaa DB! Error: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-storage rsh noobaa-db-pg-0 bash -c "pg_dump nbcore | gzip > /tmp/nbcore.gz". 2024-05-16 14:55:52 Error is error: unable to upgrade connection: container not found ("db") % oc get pod noobaa-db-pg-0 NAME READY STATUS RESTARTS AGE noobaa-db-pg-0 0/1 CrashLoopBackOff 206 (4m58s ago) 17h % oc describe pod noobaa-db-pg-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 157m (x177 over 17h) kubelet Container image "registry.redhat.io/rhel9/postgresql-15@sha256:1aeac23901c0147e4c6e9a1b8bb5f41dd6f95532b0d96adac55d609d0eed32fe" already present on machine Warning BackOff 2m49s (x4742 over 17h) kubelet Back-off restarting failed container db in pod noobaa-db-pg-0_openshift-storage(c7d42c55-f3de-45ca-ad7a-408fe7419820) Noticing this error while trying to get the noobaa-db-pg-0 logs: Incompatible data directory. This container image provides PostgreSQL '15', but data directory is of version '12'. This image supports automatic data directory upgrade from '13', please _carefully_ consult image documentation about how to use the '$POSTGRESQL_UPGRADE' startup option. Additional info: ODF Must Gather - https://url.corp.redhat.com/dc96623 Live cluster details - https://url.corp.redhat.com/331727d
Correction: Steps to Reproduce: 2. Upgrade ODF to **4.16**, sequentially (4.14 to 4.15, 4.15 to 4.16)
In the DB logs, it appears that the DB data directory is still in Postgres 12 format and not Postgres 15, a change that should have happened in 4.15. Was the upgrade to 4.15 successful before proceeding to 4.16?
Please update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591