Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1941859

Summary: [assisted operator] assisted pod deploy first time in error state
Product: OpenShift Container Platform Reporter: bjacot
Component: assisted-installerAssignee: Richard Su <rwsu>
assisted-installer sub component: stand-alone QA Contact: bjacot
Status: CLOSED ERRATA Docs Contact: jfrye
Severity: medium    
Priority: medium CC: alazar, aos-bugs, asegurap, ccrum, jfrye, ohochman, rwsu, trwest
Version: 4.8Keywords: Reopened
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: Unspecified   
Whiteboard: AI-Team-Platform
Fixed In Version: OCP-Metal-v1.0.20.1 Doc Type: Bug Fix
Doc Text:
Release Note text: Previously, the `assisted-service` container did not wait for `postgres` to start up and be ready to accept connections. The `assisted-service` container attempted to establish a database connection, failed, and the `assisted-service` container failed and restarted. This issue has been fixed by the `assisted-service` container attempting to connect to the database for up to 10 seconds. If `postgres` starts and is ready to accept connection within 10 seconds, the `assisted-service` container connects without going into an error state. If the `assisted-service` container is unable to connect to `postgres` within 10 seconds, it goes into an error state, restarts, and tries again. ------- Cause: Assisted-service does not wait for postgres to startup and to be ready to accept connections. Consequence: Assisted-service attempts to establish a database connection but fails. Assisted-service container fails and restarts. Fix: Assisted-service now retries connecting to the database for up to 10 seconds. Result: If postgres starts up and is ready for connection within 10 seconds, assisted-service will connect to it without going into error state. If it is unable to connect to postgres within 10 seconds, it will go into error state, restart, and retry again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:55:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Richard Su 2021-04-28 10:29:16 UTC
Hi Brad,

Does this problem occur specifically in disconnected environments? Does it not exist in connected environments?
I wasn't able to reproduce the issue in connected environments. I will need to find a way to reproduce if this solely a disconnected issue.

Thanks.

Comment 2 bjacot 2021-04-29 16:55:45 UTC
hey @rwsu This is being noticed in a disconnected environment.  I am happy to share my environment with you.  If the issue can be identified I think it will make the customer experience better.

Comment 3 bjacot 2021-05-05 14:35:20 UTC
closing for now as not noticing this issue anymore.

Comment 4 bjacot 2021-05-05 15:16:03 UTC
I am reopening as Richard mentioned assisted-service starting up before postgres

Comment 6 Richard Su 2021-05-06 15:02:36 UTC
Added https://github.com/openshift/assisted-service/pull/1673 to retry the db connection for up to 10 seconds.

Comment 7 Richard Su 2021-05-12 07:25:36 UTC
Hi Brad. 

The db connection retry PR has merged. 

Because we haven't been able to reproduce the original issue, I think we can close this BZ.

Comment 9 Trey West 2021-06-25 15:52:27 UTC
Verified

IPv6 disconnected jobs have been passing in our CI, I don't think this issue is present any longer.

Comment 11 errata-xmlrpc 2021-07-27 22:55:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438