Description of problem (please be detailed as possible and provide log snippests): - noobaa db stuck in CLBO, all other pods in openshift-storage NS are up an running noobaa-db-pg-0 0/1 CrashLoopBackOff 219 (97s ago) 18h 10.126.12.20 oscinfra-ldc65-storage-mhbxg <none> <none> 97s Normal Pulled pod/noobaa-db-pg-0 Container image "registry.redhat.io/rhel8/postgresql-12@sha256:aa65868b9684f7715214f5f3fac3139245c212019cc17742f237965a7508222d" already present on machine 6m34s Warning BackOff pod/noobaa-db-pg-0 Back-off restarting failed container - Checking pod logs, we are seeing error ERROR: tuple already updated by self [amenon@supportshell-1 logs]$ cat current.log 2023-06-13T13:02:53.407141058Z pg_ctl: another server might be running; trying to start server anyway 2023-06-13T13:02:53.417167652Z waiting for server to start....2023-06-13 13:02:53.434 UTC [22] LOG: starting PostgreSQL 12.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit 2023-06-13T13:02:53.434720093Z 2023-06-13 13:02:53.434 UTC [22] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-06-13T13:02:53.439280827Z 2023-06-13 13:02:53.439 UTC [22] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-06-13T13:02:53.521852556Z 2023-06-13 13:02:53.521 UTC [22] LOG: redirecting log output to logging collector process 2023-06-13T13:02:53.521852556Z 2023-06-13 13:02:53.521 UTC [22] HINT: Future log output will appear in directory "log". 2023-06-13T13:02:53.717865176Z done 2023-06-13T13:02:53.717865176Z server started 2023-06-13T13:02:53.726249227Z /var/run/postgresql:5432 - accepting connections 2023-06-13T13:02:53.730107559Z => sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ... 2023-06-13T13:02:53.737096718Z ERROR: tuple already updated by self [amenon@supportshell-1 logs]$ cat current.log 2023-06-12T18:47:52.329814485Z + export PGDATA=/var/lib/pgsql/data/userdata 2023-06-12T18:47:52.329814485Z + PGDATA=/var/lib/pgsql/data/userdata 2023-06-12T18:47:52.329892997Z postgresql.conf file is found 2023-06-12T18:47:52.329900468Z + '[' -f /var/lib/pgsql/data/userdata/postgresql.conf ']' 2023-06-12T18:47:52.329900468Z + echo postgresql.conf file is found 2023-06-12T18:47:52.329900468Z + exit 0 Version of all relevant components (if applicable): - ODF version 4.11.8 - Cluster version is 4.12.19 - This is similar to issue addressed in Bug https://bugzilla.redhat.com/show_bug.cgi?id=2010702 - We already applied KCS https://access.redhat.com/solutions/7011877 but didn't help. - The error we get when applying above KCS is that when we try to stop Postgres, it says: sh-4.4$ pg_ctl stop -D /var/lib/pgsql/data/userdata pg_ctl: could not send stop signal (PID: 22): No such process - We also tried below steps where we add an extra step 2, to Run run-postgresql which should create those files and be able to run the remaining steps, but it is also not helping. 1. Start a debug session using oc debug pod/noobaa-db-pg-0 2. From the cmd line of the debug session; Run run-postgresql 3. Run pg_ctl stop -D /var/lib/pgsql/data/userdata to cleanly shutdown Postgres. 4. Run pg_ctl start -D /var/lib/pgsql/data/userdata to start Postgres. You should see the output as mentioned in [1] and it should wait there indefinitly (no errors): 5. Press enter. 6. Run pg_ctl stop -D /var/lib/pgsql/data/userdata and wait Postgres to shutdown cleanly. 7. Exit the debug session - After trying above KCS, the error we are seeing in noobaa-db-pg-0 pod logs is 2023-06-16T12:34:30.425331231Z pg_ctl: another server might be running; trying to start server anyway 2023-06-16T12:34:30.434503981Z waiting for server to start....2023-06-16 12:34:30.497 UTC [22] LOG: starting PostgreSQL 12.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit 2023-06-16T12:34:30.498224687Z 2023-06-16 12:34:30.498 UTC [22] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-06-16T12:34:30.502843334Z 2023-06-16 12:34:30.502 UTC [22] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-06-16T12:34:30.560324086Z 2023-06-16 12:34:30.560 UTC [22] LOG: redirecting log output to logging collector process 2023-06-16T12:34:30.560324086Z 2023-06-16 12:34:30.560 UTC [22] HINT: Future log output will appear in directory "log". 2023-06-16T12:34:30.735134682Z done 2023-06-16T12:34:30.735134682Z server started 2023-06-16T12:34:30.744075842Z /var/run/postgresql:5432 - accepting connections 2023-06-16T12:34:30.748291698Z => sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ... 2023-06-16T12:34:30.754964819Z ERROR: tuple concurrently updated - Need help from engineering on how we can proceed. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This is an infrastructure cluster for testing changes and upgrades. The issue is not allowing cu to do testing. Is there any workaround available to the best of your knowledge? No Additional info: - all m-gs are attached to supportshell under ~/03536312