Bug 1941859
| Summary: | [assisted operator] assisted pod deploy first time in error state | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | bjacot |
| Component: | assisted-installer | Assignee: | Richard Su <rwsu> |
| assisted-installer sub component: | stand-alone | QA Contact: | bjacot |
| Status: | CLOSED ERRATA | Docs Contact: | jfrye |
| Severity: | medium | ||
| Priority: | medium | CC: | alazar, aos-bugs, asegurap, ccrum, jfrye, ohochman, rwsu, trwest |
| Version: | 4.8 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | All | ||
| OS: | Unspecified | ||
| Whiteboard: | AI-Team-Platform | ||
| Fixed In Version: | OCP-Metal-v1.0.20.1 | Doc Type: | Bug Fix |
| Doc Text: |
Release Note text:
Previously, the `assisted-service` container did not wait for `postgres` to start up and be ready to accept connections. The `assisted-service` container attempted to establish a database connection, failed, and the `assisted-service` container failed and restarted. This issue has been fixed by the `assisted-service` container attempting to connect to the database for up to 10 seconds. If `postgres` starts and is ready to accept connection within 10 seconds, the `assisted-service` container connects without going into an error state. If the `assisted-service` container is unable to connect to `postgres` within 10 seconds, it goes into an error state, restarts, and tries again.
-------
Cause: Assisted-service does not wait for postgres to startup and to be ready to accept connections.
Consequence: Assisted-service attempts to establish a database connection but fails. Assisted-service container fails and restarts.
Fix: Assisted-service now retries connecting to the database for up to 10 seconds.
Result: If postgres starts up and is ready for connection within 10 seconds, assisted-service will connect to it without going into error state. If it is unable to connect to postgres within 10 seconds, it will go into error state, restart, and retry again.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 22:55:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Richard Su
2021-04-28 10:29:16 UTC
hey @rwsu This is being noticed in a disconnected environment. I am happy to share my environment with you. If the issue can be identified I think it will make the customer experience better. closing for now as not noticing this issue anymore. I am reopening as Richard mentioned assisted-service starting up before postgres Added https://github.com/openshift/assisted-service/pull/1673 to retry the db connection for up to 10 seconds. Hi Brad. The db connection retry PR has merged. Because we haven't been able to reproduce the original issue, I think we can close this BZ. Verified IPv6 disconnected jobs have been passing in our CI, I don't think this issue is present any longer. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |