2149261 – noobaa-db-pg-0 stuck in init after upgrade to 4.12 getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg

Bug 2149261 - noobaa-db-pg-0 stuck in init after upgrade to 4.12 getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg

Summary: noobaa-db-pg-0 stuck in init after upgrade to 4.12 getaddrinfo ENOTFOUND noo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.12.0
Assignee:	Alexander Indenbaum
QA Contact:	Shivam Durgbuns
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-29 11:24 UTC by Petr Balogh
Modified:	2023-08-09 16:49 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-31 00:20:12 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 1001	None	open	[Backport to 5.12] RBAC for endpoint is handled by OLM	2022-12-04 14:51:57 UTC
Github	noobaa noobaa-operator pull 999	None	open	RBAC for endpoint is handled by OLM	2022-12-04 14:03:25 UTC
Red Hat Product Errata	RHBA-2023:0551	None	None	None	2023-01-31 00:20:17 UTC

Description Petr Balogh 2022-11-29 11:24:40 UTC

Description of problem (please be detailed as possible and provide log
snippests):

On VSPHERE UPI KMS VAULT V1 1AZ RHCOS VSAN 3M 3W environment we see the issues during the upgrade.

Logs from noobaa core pod:
2022-11-29T07:23:53.807081630Z [32mNov-29 7:23:53.807[35m [Upgrade/20] [36m  [LOG][39m CONSOLE:: init_rand_seed: done
2022-11-29T07:23:53.808574857Z [32mNov-29 7:23:53.808[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: apply_sql_functions execute error Error: getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:53.808574857Z [90m    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:71:26)[39m {
2022-11-29T07:23:53.808574857Z   errno: [33m-3008[39m,
2022-11-29T07:23:53.808574857Z   code: [32m'ENOTFOUND'[39m,
2022-11-29T07:23:53.808574857Z   syscall: [32m'getaddrinfo'[39m,
2022-11-29T07:23:53.808574857Z   hostname: [32m'noobaa-db-pg-0.noobaa-db-pg'[39m
2022-11-29T07:23:53.808574857Z }
2022-11-29T07:23:53.808683389Z [32mNov-29 7:23:53.808[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: _connect: initial connect failed, will retry getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:56.809466695Z [32mNov-29 7:23:56.809[35m [Upgrade/20] [36m   [L0][39m core.util.postgres_client:: _connect: called with { max: [33m10[39m, host: [32m'noobaa-db-pg-0.noobaa-db-pg'[39m, user: [32m'noobaa'[39m, password: [32m'Sf/zlZcJ2CE+CA=='[39m, database: [32m'nbcore'[39m, port: [33m5432[39m }
2022-11-29T07:23:56.811782821Z [32mNov-29 7:23:56.811[35m [Upgrade/20] [31m[ERROR][39m core.util.postgres_client:: apply_sql_functions execute error Error: getaddrinfo ENOTFOUND noobaa-db-pg-0.noobaa-db-pg
2022-11-29T07:23:56.811782821Z [90m    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:71:26)[39m {
2022-11-29T07:23:56.811782821Z   errno: [33m-3008[39m,
2022-11-29T07:23:56.811782821Z   code: [32m'ENOTFOUND'[39m,
2022-11-29T07:23:56.811782821Z   syscall: [32m'getaddrinfo'[39m,
2022-11-29T07:23:56.811782821Z   hostname: [32m


Version of all relevant components (if applicable):
Upgrade of ODF 4.11 to ODF 4.12 
Image: quay.io/rhceph-dev/ocs-registry:4.12.0-120


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Upgrade does not finish and storage cluster stuck in progressing state.


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Haven't tried yet.


Can this issue reproduce from the UI?
Haven't tried yet.


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCP and ODF 4.11 GAed version on cluster using KMS Vault 
2. Upgrade OCP and ODF to 4.12
3. During upgrade of ODF we see mentioned issue


Actual results:
Storage cluster stuck in progressing state because noobaa db pod stuck in init state.


Expected results:
Have noobaa DB pod started and finish with upgrade.


Additional info:

Jenkins job:
https://url.corp.redhat.com/3d5e3a5

Must gather:
https://url.corp.redhat.com/13792a1

Comment 17 errata-xmlrpc 2023-01-31 00:20:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Note You need to log in before you can comment on or make changes to this bug.