Bug 1569482
Summary: | Postgresql pod fails to recover automatically after OpenShift master failure | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Software Collections | Reporter: | Pili Guerra <pguerra> | ||||
Component: | rh-postgresql96-container | Assignee: | Petr Kubat <pkubat> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukáš Zachar <lzachar> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rh-postgresql96 | CC: | aos-bugs, hhorak, jokerman, marc.jadoul, mmccomas, pkubat, praiskup | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.6 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-08-05 08:55:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Pili Guerra
2018-04-19 11:29:38 UTC
Submitted too soon... Steps to Reproduce: 1. To simulate pod crash, docker kill postgresql pod container, there should ideally be some activity on the database at the time. You should see the following messages in the logs: LOG: database system was shut down at 2018-04-07 00:57:23 UTC LOG: invalid resource manager ID 45 at 1/47830788 LOG: invalid primary checkpoint record LOG: invalid resource manager ID in secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 23) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure Created attachment 1427086 [details]
Postgresql logs from original incident
Hello, We have seen some recovery happening after the pod was in crashloopback for a long time... without knowing what made the recovery suddently successfull. We are wondering if the recovery is failing because the pod do not leave enough time for the recovery to finish. So I am wondering if I could increase the timeout in case of ongoing recovery... Marc The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |