Bug 2051249
Summary: | [GSS]noobaa-db-pg-0 Pod stuck CrashLoopBackOff state | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | khover |
Component: | Multi-Cloud Object Gateway | Assignee: | Alexander Indenbaum <aindenba> |
Status: | CLOSED ERRATA | QA Contact: | Ben Eli <belimele> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | aindenba, belimele, etamir, mhackett, mmuench, muagarwa, nbecker, ocs-bugs, odf-bz-bot, tdesala |
Target Milestone: | --- | ||
Target Release: | ODF 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.10.0-168 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-13 18:52:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
khover
2022-02-06 18:29:17 UTC
Had customer run the following $ oc rsh noobaa-operator-5746b8bf88-jqnjk sh-4.4$ sh-4.4$ sh-4.4$ sh-4.4$ curl -kv https://noobaa-mgmt.openshift-storage.svc:443 * Rebuilt URL to: https://noobaa-mgmt.openshift-storage.svc:443/ * Trying 172.30.88.33... * TCP_NODELAY set * Connected to noobaa-mgmt.openshift-storage.svc (172.30.88.33) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: CN=noobaa-mgmt.openshift-storage.svc * start date: Feb 4 19:22:39 2022 GMT * expire date: Feb 4 19:22:40 2024 GMT * issuer: CN=openshift-service-serving-signer@1638811499 * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. > GET / HTTP/1.1 > Host: noobaa-mgmt.openshift-storage.svc > User-Agent: curl/7.61.1 > Accept: */* > < HTTP/1.1 302 Found < Location: /fe/ < Vary: Accept, Accept-Encoding < Content-Type: text/plain; charset=utf-8 < Content-Length: 26 < Date: Mon, 07 Feb 2022 15:25:49 GMT < Connection: keep-alive < Keep-Alive: timeout=5 < * Connection #0 to host noobaa-mgmt.openshift-storage.svc left intact Found. Redirecting to /fe/sh-4.4$ Hello, Any updates or workaround we can share with the customer ? The customer is loosing patience and faith in the product. Hello, So if the workaround includes disabling huge pages, what is the process/steps for that ? Hello According to ODF must gather the Postgres container image is registry.redhat.io/rhel8/postgresql-12@sha256:623bdaa1c6ae047db7f62d82526220fac099837afd8770ccc6acfac4c7cff100 i.e. this image uses RHEL8 as a base: ^^^^^ > bash-4.4$ cat /etc/redhat-release > Red Hat Enterprise Linux release 8.5 (Ootpa) Previously Postgres container images used RHEL7 as a base. ^^^^^ For instance, the upstream uses: > bash-4.2$ cat /etc/redhat-release > CentOS Linux release 7.8.2003 (Core) Is Postgres container image base OS change expected? @khover the posted workaround PR would support RHEL8 fs layout and run Postgres with huge pages disabled. Best regards! My customer already has hugepages enabled in the cluster. As per my understanding so far, the workaround is uninstall OCS, disable hugepages, reinstall OCS. How do I disable huge pages temporarily during OCS installation ? The customer temp is high so I just want to be sure that what we try next succeeds. @ @khover any input about why RHEL8 based Postgres container is used? As a workaround see https://github.com/noobaa/noobaa-operator/pull/853/files, it is a tiny change. To get through DB initialization, "oc edit cm noobaa-postgres-initdb-sh" and add if block bellow marked by a plus sign. This procedure does not require reinstalling OCS. # Wrap the postgres binary, force huge_pages=off for initdb # see https://bugzilla.redhat.com/show_bug.cgi?id=1946792 p=/opt/rh/rh-postgresql12/root/usr/bin/postgres + + # Latest RH images moved the postgres binary + # from /opt/rh/rh-postgresql12/root/usr/bin/postgres to /usr/bin/postgres + # see https://bugzilla.redhat.com/show_bug.cgi?id=2051249 + if [ ! -x $p ]; then + p=/usr/bin/postgres + fi + mv $p $p.orig echo exec $p.orig \"\$@\" -c huge_pages=off > $p chmod 755 $p Alternatively: - you could disable huge pages during OCS/NooBaa installation and then re-enabled huge pages - use RHEL7 based Postgres container Let me know if you need any additional help. Best regards! Hi Alex, Thanks for all your help on this. Re: @khover any input about why RHEL8 based Postgres container is used? I honestly dont know, this is a install of OCS ocs-operator.v4.8.6 IF there is some way to check or info needed id be happy to help you collect. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |