Bug 2149261

Summary: noobaa-db-pg-0 stuck in init after upgrade to 4.12 getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Petr Balogh <pbalogh>
Component: Multi-Cloud Object GatewayAssignee: Alexander Indenbaum <aindenba>
Status: CLOSED ERRATA QA Contact: Shivam Durgbuns <sdurgbun>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.12CC: aindenba, dzaken, ebenahar, etamir, muagarwa, ocs-bugs, odf-bz-bot, rperiyas, vavuthu
Target Milestone: ---Keywords: Automation, Regression
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-31 00:20:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Balogh 2022-11-29 11:24:40 UTC
Description of problem (please be detailed as possible and provide log
snippests):

On VSPHERE UPI KMS VAULT V1 1AZ RHCOS VSAN 3M 3W environment we see the issues during the upgrade.

Logs from noobaa core pod:
2022-11-29T07:23:53.807081630Z [32mNov-29 7:23:53.807[35m [Upgrade/20] [36m  [LOG][39m CONSOLE:: init_rand_seed: done
2022-11-29T07:23:53.808574857Z [32mNov-29 7:23:53.808[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: apply_sql_functions execute error Error: getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:53.808574857Z [90m    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:71:26)[39m {
2022-11-29T07:23:53.808574857Z   errno: [33m-3008[39m,
2022-11-29T07:23:53.808574857Z   code: [32m'ENOTFOUND'[39m,
2022-11-29T07:23:53.808574857Z   syscall: [32m'getaddrinfo'[39m,
2022-11-29T07:23:53.808574857Z   hostname: [32m'noobaa-db-pg-0.noobaa-db-pg'[39m
2022-11-29T07:23:53.808574857Z }
2022-11-29T07:23:53.808683389Z [32mNov-29 7:23:53.808[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: _connect: initial connect failed, will retry getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:56.809466695Z [32mNov-29 7:23:56.809[35m [Upgrade/20] [36m   [L0][39m core.util.postgres_client:: _connect: called with { max: [33m10[39m, host: [32m'noobaa-db-pg-0.noobaa-db-pg'[39m, user: [32m'noobaa'[39m, password: [32m'Sf/zlZcJ2CE+CA=='[39m, database: [32m'nbcore'[39m, port: [33m5432[39m }
2022-11-29T07:23:56.811782821Z [32mNov-29 7:23:56.811[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: apply_sql_functions execute error Error: getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:56.811782821Z [90m    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:71:26)[39m {
2022-11-29T07:23:56.811782821Z   errno: [33m-3008[39m,
2022-11-29T07:23:56.811782821Z   code: [32m'ENOTFOUND'[39m,
2022-11-29T07:23:56.811782821Z   syscall: [32m'getaddrinfo'[39m,
2022-11-29T07:23:56.811782821Z   hostname: [32m


Version of all relevant components (if applicable):
Upgrade of ODF 4.11 to ODF 4.12 
Image: quay.io/rhceph-dev/ocs-registry:4.12.0-120


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Upgrade does not finish and storage cluster stuck in progressing state.


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Haven't tried yet.


Can this issue reproduce from the UI?
Haven't tried yet.


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCP and ODF 4.11 GAed version on cluster using KMS Vault 
2. Upgrade OCP and ODF to 4.12
3. During upgrade of ODF we see mentioned issue


Actual results:
Storage cluster stuck in progressing state because noobaa db pod stuck in init state.


Expected results:
Have noobaa DB pod started and finish with upgrade.


Additional info:

Jenkins job:
https://url.corp.redhat.com/3d5e3a5

Must gather:
https://url.corp.redhat.com/13792a1

Comment 17 errata-xmlrpc 2023-01-31 00:20:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551