Bug 1941859 - [assisted operator] assisted pod deploy first time in error state
Summary: [assisted operator] assisted pod deploy first time in error state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: All
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Richard Su
QA Contact: bjacot
jfrye
URL:
Whiteboard: AI-Team-Platform
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-22 22:43 UTC by bjacot
Modified: 2021-07-27 22:55 UTC (History)
8 users (show)

Fixed In Version: OCP-Metal-v1.0.20.1
Doc Type: Bug Fix
Doc Text:
Release Note text: Previously, the `assisted-service` container did not wait for `postgres` to start up and be ready to accept connections. The `assisted-service` container attempted to establish a database connection, failed, and the `assisted-service` container failed and restarted. This issue has been fixed by the `assisted-service` container attempting to connect to the database for up to 10 seconds. If `postgres` starts and is ready to accept connection within 10 seconds, the `assisted-service` container connects without going into an error state. If the `assisted-service` container is unable to connect to `postgres` within 10 seconds, it goes into an error state, restarts, and tries again. ------- Cause: Assisted-service does not wait for postgres to startup and to be ready to accept connections. Consequence: Assisted-service attempts to establish a database connection but fails. Assisted-service container fails and restarts. Fix: Assisted-service now retries connecting to the database for up to 10 seconds. Result: If postgres starts up and is ready for connection within 10 seconds, assisted-service will connect to it without going into error state. If it is unable to connect to postgres within 10 seconds, it will go into error state, restart, and retry again.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:55:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1673 0 None closed OCPBUGSM-26820 Add database connection retry 2021-05-12 07:13:05 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:55:16 UTC

Comment 1 Richard Su 2021-04-28 10:29:16 UTC
Hi Brad,

Does this problem occur specifically in disconnected environments? Does it not exist in connected environments?
I wasn't able to reproduce the issue in connected environments. I will need to find a way to reproduce if this solely a disconnected issue.

Thanks.

Comment 2 bjacot 2021-04-29 16:55:45 UTC
hey @rwsu This is being noticed in a disconnected environment.  I am happy to share my environment with you.  If the issue can be identified I think it will make the customer experience better.

Comment 3 bjacot 2021-05-05 14:35:20 UTC
closing for now as not noticing this issue anymore.

Comment 4 bjacot 2021-05-05 15:16:03 UTC
I am reopening as Richard mentioned assisted-service starting up before postgres

Comment 6 Richard Su 2021-05-06 15:02:36 UTC
Added https://github.com/openshift/assisted-service/pull/1673 to retry the db connection for up to 10 seconds.

Comment 7 Richard Su 2021-05-12 07:25:36 UTC
Hi Brad. 

The db connection retry PR has merged. 

Because we haven't been able to reproduce the original issue, I think we can close this BZ.

Comment 9 Trey West 2021-06-25 15:52:27 UTC
Verified

IPv6 disconnected jobs have been passing in our CI, I don't think this issue is present any longer.

Comment 11 errata-xmlrpc 2021-07-27 22:55:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.